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An  iterative  algorithm,  called  the  Variable-Order  (VO)  algorithm, 
is  derived  for  computing  a  local  minimum  of  a  nonlinear  function  of 
several  independent  variables.   The  VO  algorithm  is  shown  to  be  very 
competitive  with  several  existing  algorithms.   The  class  of  functions  for 
which  the  algorithm  is  globally  convergent  is  established.   It  is  shown 
that  the  VO  algorithm  converges  with  order  as  high  as  four.   A  major 
step  of  the  algorithm  is  the  solution  of  a  scalar  problem  that  may  be 
along  curved  trajectories  in  the  space  of  the  independent  variables,  in- 
stead of  along  straight  lines  as  in  most  existing  algorithms.   Approxi- 
mations to  required  higher-order  derivatives  are  given  which  allow  the 
use  of  the  VO  algorithm  even  if  only  function  values  can  be  supplied.   If 
the  supplied  function  and  gradient  values  are  somewhat  inaccurate,  as  is 
often  the  case  in  computer-aided  circuit  optimization  procedures,  the  VO 
algorithm  is  shown  to  be  still  effective.   The  algorithm  may  be  used  to 
compute  a  solution  of  a  general  nonlinear  programming  problem  by  the  use 
of  penalty  functions. 

Special  cases  of  the  VO  algorithm  are  used  to  develop  an  iterative 
method  to  solve  nonlinear  equations  that  arise  in  circuit  analysis  with 
resulting  modest  improvements  in  efficiency.  The  new  method  is  applied 
to  transient  and  dc  analysis  of  nonlinear  MOSFET  and  bipolar  circuits. 
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CHAPTER  1 
INTRODUCTION 

It  has  been  nine  years  since  publication  of  the  first,  by  now 
almost  classical,  special  issue  of  the  Proceedings  of  the  IEEE  [57]  on 
computer-aided  design  of  circuits.   That  issue  marked  the  beginning  of 
a  trend  to  use  the  computer  as  an  active  partner  in  design  rather  than 
simply  in  a  passive  role  for  simulation.   The  circuits  at  that  time 
might  be  described  as  being  in  the  infancy  of  integrated  electronics. 
The  most  complex  integrated  circuit  chip  might  have  consisted  of  ten 
devices;  it  was  still  somewhat  feasible  to  use  breadboarding  techniques 
as  aids  in  the  design. 

During  the  ensuing  nine  years,  several  more  journals  have  been 
devoted  to  the  subject  of  computer-aided  design  of  circuits,  among  them 
[58-61],  and  integrated  electronics  has  matured  to  the  point  where  large- 
scale  integrated  circuits  which  contain  thousands  of  devices  can  be 
manufactured.   Breadboarding  of  circuits  has  generally  ceased  to  be  very 
useful  as  a  design  tool,  and  the  computer  has  become  indispensable  in 
the  entire  design  process.   The  computer  is  being  used  at  many  stages, 
and  for  many  different  purposes,  during  the  design  of  circuits.   We  will 
be  concerned  with  the  use  of  the  computer  to  optimize  a  circuit  by  vary- 
ing some  designable  parameters  in  order  to  achieve  the  design  objectives. 

In  circuit  optimization,  a  scalar  performance  function  representing 
the  design  objectives  is  minimized.   There  are  two  principal  computation- 
al steps  in  this  procedure.   First,  a  numerical  minimization  algorithm 
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must  be  employed  to  adjust  the  designable  parameters  in  order  to  mini- 
mize the  scalar  performance  function.   Minimization  algorithms  begin 
from  an  initial  point,  a  guess  to  the  optimum  set  of  designable  para- 
meters, to  generate  a  sequence  of  points  which  hopefully  converges  to 
the  minimum  of  the  scalar  performance  function.   Most  minimization 
algorithms  require  the  numerical  values  of  the  performance  function  and 
its  gradient  with  respect  to  the  designable  parameters  evaluated  at 
several  points  during  the  minimization  procedure.   The  number  of  func- 
tion and  gradient  evaluations  required  is  generally  proportional  to  the 
computational  cost  of  the  minimization. 

Second,  the  evaluation  of  the  scalar  performance  function  and  its 
gradient  generally  involves  analysis  of  the  circuit  equations,  a  rather 
large  set  of  nonlinear  algebraic  and  differential  equations.   The  large 
number  of  circuit  equations  is  not  only  due  to  the  increased  circuit 
size,  but  also  due  to  the  use  of  more  complex  device  models  for  improved 
accuracy. 

The  goals  of  this  research  were  to  improve  the  efficiency  of  the 
two  computational  steps  described  above.   The  major  accomplishments  were 
the  derivation  of  a  very  promising  new  minimization  algorithm,  and  a  new 
iterative  method  to  solve  the  nonlinear  algebraic  equations  that  arise 
in  the  analysis  of  circuits. 

The  main  contribution  of  this  research  is  the  development  of  a  new 
minimization  algorithm.   Although  the  new  algorithm  has  some  short- 
comings, numerical  results  on  several  examples  show  that  it  is  quite 
accurate,  and  more  efficient  than  other  existing  algorithms  for  the 
majority  of  the  examples  tried.   From  a  theoretical  standpoint  the  algo- 
rithm has  two  novel  new  features:   1)  it  has  a  variable  order  of 
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convergence  up  to  order  four,  and  2)  it  has  a  novel  scalar  search  at 
each  iteration  which  may  be  along  curved  trajectories  in  the  space  of 
the  independent  variables . 

A  second  contribution  of  this  research  is  an  iterative  method  for 
solving  the  nonlinear  equations  that  arise  in  the  transient  analysis  of 
the  circuit  equations.   Modest  improvements  in  efficiency  were  obtained 
when  the  new  iterative  method  was  implemented  in  an  already  very  effi- 
cient transient  analysis  program. 

Other  minor  contributions  can  be  summarized  as  follows: 

1)  A  potentially  useful  Taylor  series  expansion  of  the  solution 
point  of  a  system  of  nonlinear  equations.   Different  forms  of 
the  series,  when  truncated,  yield  different  iterative  methods. 
An  iterative  method  was  used  to  obtain  dc  solutions  of  the  cir- 
cuit equations. 

2)  A  modified  Cholesky  factorization  of  a  symmetric  matrix,  the 
hessian,  which  in  effect  modifies  the  matrix  when  It  is  not 
possitive  definite.   The  new  factorization  is  a  modification  of 
a  previously  proposed  technique  [36]. 

3)  An  apparently  novel  scheme  for  computing  difference  approxima- 
tions to  first  and  second  derivatives,  the  gradient  and  the 
hessian.   The  scheme  automatically  takes  into  consideration 
errors  that  may  be  present  in  the  function  values  that  are  used 
in  the  difference  approximations. 

4)  A  new  method  of  describing  minimization  algorithms  to  account 
for  other  than  straight-line  directions  of  search.   Existing 
theorems  are  extended  to  the  new  description. 
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The  organization  of  the  chapters  is  the  following.   In  Chapter  2 
we  offer  a  brief  theoretical  and  computational  view  of  computer-aided 
circuit  design.   In  that  chapter  a  brief  historical  review  of  minimiza- 
tion algorithms  is  also  given.   In  Chapter  3  the  new  minimization 
algorithm  is  derived  and  its  theoretical  properties  are  established. 
In  Chapter  4,  implementation  of  the  new  algorithm  is  described.   In 
addition,  several  examples  and  comparisons  with  other  algorithms  are 
reported.   In  Chapter  5  the  concepts  used  in  the  derivation  of  the  mini- 
mization algorithm  are  used  to  derive  iterative  methods  for  finding 
solutions  to  nonlinear  equations.   Finally,  Chapter  6  offers  general 
conclusions  and  some  suggestions  for  further  research. 


CHAPTER  2 
A  VIEW  OF  COMPUTER-AIDED  CIRCUIT  DESIGN 

A  successful  approach  in  using  the  computer  as  a  circuit  design 
tool  has  been  to  minimize  a  scalar  performance  function,  which  repre- 
sents the  design  objectives,  by  adjusting  in  some  suitable  fashion  the 
designable  parameters.   This  procedure  requires  many  steps  which  will 
be  briefly  outlined  in  this  chapter. 

The  first  step  in  using  optimization  for  circuit  design  is  to 
recast  the  problem  into  a  nonlinear  programming  problem  by  character- 
izing the  qualitative  design  objectives  by  a  scalar  performance  function 
with  constraints.   This  step  is  quite  heuristic  and  requires  great 
insight  on  the  part  of  the  circuit  designer.   After  this  initial  step, 
the  computer  takes  over  by  approximating  the  solution  of  the  nonlinear 
programming  problem. 

The  derivation  and  the  computational  steps  involved  in  the  evalua- 
tion of  the  scalar  performance  function  and  its  gradient,  needed  for 
solving  the  nonlinear  programming  problem,  will  be  briefly  outlined. 
The  chapter  ends  with  a  brief  historical  review  of  existing  methods  for 
solving  the  unconstrained  minimization  problem  which  results  from  the 
nonlinear  programming  problem. 

2.1  Nonlinear  Programming  Circuit  Problem 
Although  there  have  been  some  fairly  successful  attempts  at  synthesis, 
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where  a  circuit  is  "grown"  to  meet  specifications  [43,6],  it  may  be 
safely  stated  that  this  problem  has  not  been  solved  satisfactorily. 
Thus  it  will  be  assumed  that  a  circuit  which  somewhat  adequately  meets 
the  design  objectives  is  available,  e.g.  from  a  catalog  of  circuits.   It 
is  then  desired  to  change  parameters  of  this  circuit  to  improve  its  in- 
tended objective. 

The  design  objectives  of  a  circuit  are  usually  specified  in  a 
somewhat  qualitative  manner.   Some  of  the  objectives  are  to  minimize 
certain  quantities,  such  as  power  dissipation,  time  delays,  circuit  size, 
or  minimizing  the  difference  between  a  desired  voltage  or  current  curve 
with  the  actual  curve.   Other  objectives  may  be  described  as  constraints 
on  the  solutions  or  on  the  designable  parameters,  such  as  voltage  or 
current  levels  less  than  (or  greater  than)  some  value,  propagation 
delays  no  larger  than  some  value,  low  and  high  limits,  called  box  con- 
straints, on  the  designable  parameters,  etc.   It  is  the  circuit  design- 
er's job  to  translate  the  usually  unspecific  design  objectives  into  a 
set  of  specific  scalar  functions.   Often  this  specification  step  yields 
several  scalar  functions  to  be  minimized,  several  constraint  functions, 
and  box  constraints  on  the  designable  parameters. 

Most  circuit  optimization  programs  require  that  all  the  scalar 
functions  to  be  minimized  be  combined  in  some  manner  to  obtain  a  single 
scalar  performance  or  objective  function  to  be  minimized.   The  simplest 
technique  for  accomplishing  this  combination  is  to  choose  a  performance 
function  which  is  a  weighted  sum  of  all  the  functions  to  be  minimized. 
The  general  form  of  such  a  performance  function  is 


T 

f  =  /   e(w,  q,  X,  t)  dt    ,  (2.1) 

"v  '\^  % 

0 
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where  w  is  a  vector  of  branch  voltages,  branch  currents  and  node  volt- 
ages of  the  circuit;  q  is  a  vector  of  capacitance  charges  and  inductor 
fluxes;  X  is  the  vector  of  time-independent  designable  parameters,  e.g. 
geometries  of  each  device  in  the  circuit;  and  t  is  time.   The  scalar 
function  e  represents  a  numerical  compromise  of  the  design  objectives 
in  that  a  different  combination  of  the  functions  to  be  minimized  yields 
a  different  function  e,  and  thus  a  different  minimization  problem.   As 
published  applications  have  shown  [28,12,5],  this  numerical  compromise 
implies  that  circuit  design  carried  out  in  this  manner  may  require 
several  minimizations  to  achieve  the  best  design,  as  interpreted  by  the 
circuit  designer. 

The  minimization  of  (2.1)  must  be  carried  out  subject  to  the  circuit 
designer  constraints  and  subject  to  the  circuit  equation  constraints, 
which  is  a  nonlinear  programming  problem.   This  problem  may  be  described 
as  follows 


T 
minimize     f(w,  q,  x)  =  /   e(w,  q,  x,  t)  dt    ,      (2.2a) 

W,q,X  'X.    'V.   -v.        ^  'X,        ry.        rx, 


subject  to   u(w,  g,  X,  t)   <  0   ,  (2.2b) 


L        H 
X   <  X  <  X    ,  (2.2c) 

%  ~     '\j    ~     "Xj 


H(w,  q,  X,  t)  =  0   ,  (2. 2d) 


E  w  -  q  =  0,  q(t  )  =  q  (x)    ,  (2.2e) 

where  (2.2b)  are  the  designer's  nonlinear  constraints,  (2.2c)  are  the 
box  constraints,  and  (2. 2d)  with  (2.2e)  represent  the  circuit  equations. 
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The  circuit  equations,  which  can  also  be  expressed  by 


1"^^'  3'  3»  ?'  ^^   = 
'Xj  %     '\j     '\j      ix, 


H(w,  q,  X,  t) 


E  w 
^   a, 


=  0.  q(0)=q  (x)  ,  (2.3) 


where  the  initial  conditions  qgCx)  may  be  functions  of  the  designable 
parameters,  consist  of  Kirchhof f-laws  equations  and  the  branch  consti- 
tuitive  equations  [15,17]. 

The  box  constraints  (2.2c)  are  usually  handled  by  transformations 
[8],  or  directly  as  done  by  the  algorithm  to  be  given  in  Chapters  3  and 
4.   The  nonlinear  constraints  (2.2b)  are  usually  made  part  of  the  e 
function  in  (2.2a)  by  using  penalty  functions  as  described  in  Chapter  4. 
Therefore  we  will  discuss  the  numerical  and  theoretical  considerations 
in  solving  the  problem  given  by 


minimize     f(w,  q,  x)  =  /  e(w,  q,  x,  t)  dt 
w.q.x         'V  '^,  'V    ^g   'x.  n.  -x. 


(2.3a) 


subject  to    H(w,  q,  x,  t)  =  0 

Oi  Oi   Ij   Oi  '\; 


(2.3b) 


E  q 

0/  'h 


q  =  0,  q(0)  =  q  (x) 


(2.3c) 


where  without  loss  of  generality  t  =  0  is  assumed. 


2.2  Derivation  of  the  Gradient 


Any  solution  of  (2.3)  must  satisfy  the  necessary  condition  that  the 
first  variational  (or  gradient)  or  the  Lagranglan  vanishes  [8,29].   The 
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Lagrangian  is  given  by 


L(w,  q,  X,  w,  q)  =  /  [e(w,  g,  x,  t)  +  w  H(w,  q,  x,  t) 


+  q  (Ew  -  q)]  dt 


(2.4) 


where  w(t)  and  q(t)  are  the  Lagrange  multiplier  vectors  which  are  func- 

O.  'Xj 

tions  of  time.   The  first  variational  of  L  is  given  by 


^  .ae  .  ^T  3« 


^L  =  /   [|!  +  f"  i^+i"   e)  5w  dt  + 


0   0/ 


a.  Oi   1/ 


r^    fde    .  'T  ^^  ,  .-.Ti  .   .^    -T  - 
J   It~  +  ^  T"  +  q  J  "Sq  dt  -  q   6q 
■'„  ^9q    '\.   9q    %  -'   a,      a,   a, 


-8e 


9H 


/   (1^  +  "  T^)  -Sx  dt  + 


T  T 

.-T  „  ,  .     r       „-T 


/   6w  H  dt  +  /   6q   (E  W  -  q)  dt 


(2.5) 


where  the  variational  term  in  (2.4)  involving  q  is  obtained  using  in- 
tegration by  parts  as  follows 


<5(/ 


-T  •  ^  ^ 
q   q  dtj  = 


T  T 

-/   5q   q  dt  -  /   q   6q  dt 

/-)    Oi    '\/  /^  '\j     O/ 


T  T 

-/   6q  g  dt  +  /  q  6q  dt  -  q   5q 
0  -v  ^      0  -^   -^ 


(2.6) 


In  order  to  satisfy  the  necessary  conditions  one  must  find  time- dependent 
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vectors  w   (t),    q    (c) ,   w   (t),    and   q    (t),    and   a   time-independent  vector 

'V,       a.       'V  a. 


X  to  satisfy 


6L  =  0 


(2.7) 


The  problem  of  satisfying  (2.7)  normally  has  an  extremely  large 
dimension.   The  dimension  can  be  reduced  substantially  if  one  makes  two 

assumptions.   First,  if  the  circuit  equations  can  be  satisfied  at  all 

k  k         k 

values  of  x,  then  given  an  x  =  x  ,  we  can  obtain  w  (t)  and  q  (t)  such 

that  the  circuit  equations 


H(w  ,  g  ,  X  ,  t)  =  0 


(2.8a) 


li -{""  =  '.'     ^^^'^   =  ^0^^'^ 


(2.8b) 


are  satisfied.   Second,  if  the  Jacobian  operator  of  the  circuit  equa- 
tions given  by 


/  = 


9H 
3w 


3H 
9q 


a,  dt 


(2.9) 


is  invertible  in  the  interval  0  £  t  £  T,  then  the  Lagrange  multiplier 

-k  -^      -k  -^ 
vectors  w  (t)  and  q  (t)  can  be  computed  from 


,T  SH*^    ,T  3H^ 
--k   'b   ,  '^k   ix, 

+  q  T — 

't  3q 


3w 


3e 
3w 


(2.10a) 
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T  T 

w        E q 

'^        "^       dt   '^ 


1^   ,       qNo)    =    0 

dq  Oi  '\; 


(2.10b) 


-sk   -^  --k 

where   t   =  T  -   t    (note   Chat  q    (t  =  0)   =   q    (t  =  T)).      The  equations  in 

(2.10)   may  be  written  in  matrix  form  as   follows 


3H 
3w 


8H 


-1  A 
-v,  dt 


T 

-k 

w 

a. 

k^" 
9w 

-k 
J    _ 

ae" 

_  3q       J 

(2.11) 


which  can  be  solved  because  of  the  second  assumption.  Observe  that  the 
solution  to  (2,11)  is  carried  out  in  t,  where  t  =  T  -  t,  which  is  back- 
wards in  the  original  time  variable.  These  two  assumptions  are  reason- 
able since  the  designable  parameters  are  normally  constrained  by  the 
box  constraints,  so  that  an  actual  physical  circuit  is  always  obtained, 
and  therefore  the  circuit  equations  should  always  have  a  solution. 

With  the  preceding  two  assumptions,  using  (2.8)  and  (2.10),  the 

]^ 
variational  6L  in  (2.5)  at  x  =  x  becomes 

0,   % 


,     T    k    ,  T  9H     ,        .  T 
"^L  =  /   (l!-  +  f  T^)  ^^     dt  +  r  (0)  6q''(0)    .      (2.12) 


^8x    '^        9x 


This  variational  may  be  expressed  as   the   gradient  with  respect   to  x 
evaluated   at  x     by 


3L 

3x 


T      -    k  ,T    3H  , 

=    J       [- +  w J    dt   +   q 

0         O/  a, 


3q''(0) 

(0)  -^ 


3x 


(2.13) 
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since  x  is  not  a  function  of  time.   This  expression  implies  that  the 
problem  has  been  reduced  in  dimension.   We  can  now  use  any  unconstrained 
minimization  algorithm  which  would  vary  only  the  designable  parameters 
by  using  function  and  gradient  values  of  the  Lagrangian  now  effectively 
a  function  of  only  x. 


2.3  Computational  Flow 

Most  iterative  minimization  algorithms  require  that  an  initial 
guess  to  the  solution  point  be  given,  and  that  the  values  of  the  func- 
tion and  the  gradient  be  supplied  to  the  algorithm  at  points  generated 
by  the  algorithm  (the  next  section  and  Chapters  3  and  4  offer  a  more 
detailed  description  of  minimization  algorithms).   Thus  at  any  value  of 
X  =  X  given  by  the  minimization  algorithm,  one  must  supply  L  and  the 
gradient  of  L  with  respect  to  x,  both  evaluated  at  x  =  x  .   Using  the 
derivation  in  the  preceding  section,  the  computational  flow  will  be 
described  now.   For  notational  convenience,  the  superscript  k  will  be 
dropped. 

STEP  1.   Determine  q(0).   These  are  the  initial  conditions.   There 
may  be  three  possibilities,  1)  q(0)  =  q.,  where  q„  is  a 
constant  vector;  2)  q(0)  =  q  ,  where  q   is  part  of  the 
designable  parameters  as  in  a  periodic  steady  state  problem 
[13];  and  3)  q(0)  is  computed  from  a  dc  analysis  of  the 
circuit  equations.   That  is,  q(0)  and  w(0)  satisfy 


H(w(0),  q(0),  x)  =  0    ,  (2.14a) 

E  w(0)  =  p    .  (2.14b) 
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In  Chapter  5,    dc   analysis   is  discussed   further  where  a 
new  algorithm  is   given, 
STEP   2.      Compute  an  approximation   to  w(t)   and   q(t)   by  a   transient 
analysis   from  t   =   0  to   t   =  T  of   the   circuit  equations   to 
satisfy 


H(w,    q,    X,    t)    =  0        ,  (2.15a) 


E  w  -  q  =  0         ,  (2.15b) 


with  q(0)  obtained  from  STEP  1.   In  this  transient  anal- 

•Xj 

ysls,   the  value  of  the  Lagrangian  (2.4)  can  be  computed, 
which  due  to  (2.15)  is  now  given  by 


L  =  /  e(w,  q,  X,  t)  dt     .  (2.16) 

'\j     '\j     % 


Transient  analysis   of   the  circuit   equations  is   described 
in  more  detail   in  Chapter   5. 
STEP   3.      Compute  an  approximation   to   the  Lagrange  multipliers  w(t) 
and  q(t)    by  a   transient   analysis   from   t=T-t=Oto 
t  =  T   -   t  =  T   (i.e.,    t   running  backwards)    to   satisfy 


-^T     oy    ,    -T     %  3e  ,_    -_    , 

w-r—  +  q^—  =--T-  ,  (2.17a) 

-T.      9w       ^      3q  9w 


f    E   -  4  q^   =   -  -^  ,  (2.17b) 


dt 


9q 


with  q(t  =  0)  =  0.   In  this  transient  analysis,  compute 
the  vector 


-14- 


f— 1   =  / 


^3x 


'>.   3X'' 


dt 


(2.18) 


which  Is  the  dynamic  portion  of  the  gradient  (see  (2.13)) 
STEP  4.   Compute  the  equilibrium  portion  of  the  gradient  given  by 

aq(0) 


:^U 


Ueq  =  ^  (o> 


3x 


(2.19) 


The  initial  conditions  q(0)  are  determined  by  one  of  the 
three  possibilities  outlined  in  STEP  1.   The  value  of 
(2.19)  therefore  has  also  three  possibilities,  1)  if 
q(0)  =  q  ,  where  q_  is  constant,  then  the  term  (2.19)  is 
zero;  2)  if  q(0)  =  g^,  where  g   is  obtained  from  a  subset 
of  the  designable  parameter  vector  [1,16],  that  is,  if 


.  d    ,T 


with  q(0)  =  q_,  then 


r3Ll 


(^'eq=(S'^('  =  «' 


in  this  case;  and  3)  if  q(0)  is  computed  from  a  dc  anal- 
ysis  satisfying  (2.14),  then  this  term  requires  additional 
work.   Differentiating  (2.14)  with  respect  to  x,  and  ex- 
pressing the  result  in  matrix  notation  yields 


aH 
"aw 


3H 
3q 


9^(0) 
3x 


9q(0) 


3H 
3x 


(2.20) 


J   L  3J£   J 
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which  is  a  matrix-matrix  linear  equation.   Since  we  want 
(2.19),  if  the  system 


/-T  -T, 


9H 
3w 


"si" 


=  (P   q  (0)) 


(2.21) 


is  solved  for  the  vectors  w  and  q,  then  after  multiplying 
both  sides  of  (2.20)  by  (w'^  "^^) ,  and  using  (2.21), 
we  obtain 


(0   q^(0)) 

1)  '\, 


which  yields 


=    (w  q    ) 


3H 
a, 

"ax 


-T, 


8q(0) 


(-^        =q^(0)^—  =-w 


■9x^EQ        't 


3x 


•T  _jv« 
3x 


(2.22) 


Now   the  entire  gradient   is  ,given_by 


_3L  _    r^jA         ,     r9L-t 
9;^  ~    '■Bx-'dY         '•3x''EQ 


This  four-step  procedure  has  been  implemented  in  a  general  circuit 
optimization  program  [27]  with  excellent  results.   Recently,  it  was 
shown  how  the  performance  function  (2.3a)  can  be  made  more  useful  by 
the  use  of  the  event  functional  [5]  which  allows  the  inclusion  of  time 
quantities,  such  as  time  delays,  within  the  entire  procedure. 


-16- 

Clearly  this  entire  procedure  can  be  computationally  very  costly. 
Each  function  evaluation  requested  by  the  minimization  algorithin  requires 
a  transient  analysis  of  a  system  of  nonlinear  algebraic  and  differen- 
tial equations,  and  a  transient  analysis  of  a  system  of  linear  time 
varying  algebraic  and  differential  equations  is  additionally  required 
for  the  gradient.   It  is  therefore  essential  that  1)  the  minimization 
algorithm  used  be  extremely  efficient  requiring  a  small  number  of  func- 
tion and  gradient  evaluations  to  obtain  the  minimum,  and  2)  the  entire 
four-step  procedure  outlined  above  must  be  very  efficiently  implemented. 
Due  to  the  previously  mentioned  heuristic  procedure  of  generating  the 
scalar  performance  function,  this  entire  design  procedure  is  manually 
iterative  thus  emphasizing  the  need  for  overall  efficiency. 

2.4  Review  of  Unconstrained  Minimization 

Powell  recently  observed  that  in  the  last  several  years  most  of 
the  useful  work  in  the  area  of  unconstrained  minimization  has  been  in 
understanding,  improving  and  extending  existing  methods  rather  than 
devising  new  algorithms  [41].   Indeed  most,  if  not  all,  minimization 
algorithms  can  be  described  as  first  computing  a  direction  of  search 
from  the  current  estimate  to  the  solution  and  then  obtaining  the  next 
point  along  this  direction.   That  is,  if  the  unconstrained  minimization 
problem  is  expressed  by 

minimize   f(x)    , 


most  algorithms,  at  the  k   iteration,  first  compute  a  direction  of 
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search,  represented  by  a  vector  d  ,  by  using  the  values  of  the  function 
f(x  )  and  perhaps  some  of  its  derivatives.  Then  the  next  point  x  is 
obtained  by  searching  along  this  straight  line  in  some  manner  to  yield 


k+1    k  ,     ,k 
X    =  X  +  p,  d 

a        'V       K.    'h 


where  p   is  a  scalar  often  called  the  step-length.   Thus,  most  existing 
minimization  algorithms  have  two  principal  steps  in  each  iteration: 
choosing  the  direction  of  search,  and  the  scalar  search  along  this 
direction  to  obtain  a  suitable  step-length  p  . 

The  direction  of  search  is  one  of  the  differences  among  algorithms. 
The  oldest  minimization  algorithms  are   1)  steepest  descent,  where  the 
direction  of  search  is  in  the  direction  of  the  negative  gradient; 
2)  coordinate  descents,  where  the  direction  of  search  is  along  each 
coordinate  direction,  i.e.,  one  variable  is  adjusted  at  a  time;  3)  New- 
ton's method,  where  the  direction  of  search  is  the  product  of  the 
inverse  of  the  second  derivative  matrix,  the  hessian,  and  the  negative 
gradient  [34].  The  most  robust  of  these  algorithms  is  steepest  descent 
which  converges  with  order  one,  while  the  fastest  is  Newton's  which 
converges  with  second  order  for  most  functions.   For  this  reason,  from 
1959  to  the  present,  much  of  the  activity  in  the  area  of  unconstrained 
minimization  algorithms  has  been  to  devise  techniques  that  approach  the 
speed  of  Newton's  method  without  its  disadvantages,  in  particular  its 
requirement  of  the  hessian  matrix. 

Davidon  [l4]  in  1959  published  an  algorithm  which  uses  only  gradient 
information  to  in  effect  build  an  approximation  of  the  hessian  inverse 
as  the  algorithm  progresses  towards  the  solution;  thus  the  method 


Note  that  subscripts  will  be  used  for  scalars. 
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approaches  Newton's  method  after  a  number  of  iterations.   Davidon's 
method,  which  is  based  on  hessian  conjugate  directions,  gave  birth  to 
a  very  large  number  of  algorithms  generally  called  quasi-Newton  algo- 
rithms [22,23,7,34]. 

A  characteristic  of  the  earlier  quasi-Newton  algorithms  was  that 
the  scalar  search  to  compute  p,  had  to  solve  the  scalar  minimization 
problem 


k      k 
minimize   f(x  +  p  d  ) 


very  accurately.   The  accurate  solution  of  this  problem  is  quite  costly 
computationally  as  many  researchers  have  shown  [46,34].   The  elimination 
of  the  requirement  to  solve  this  scalar  minimization  problem  accurately 
was  the  principal  motivation  for  many  of  the  latest  quasi-Newton  algo- 
rithms [21,7,12],  although  some  researchers  were  additionally  motivated 
by  deriving  algorithms  which  required  only  function  values  [40,49,12]. 

The  amount  of  Information  about  the  function  which  must  be  supplied 
to  minimization  algorithms  has  been  a  motivation  for  the  development  of 
many  new  algorithms.   The  general  tendency  in  deriving  algorithms,  since 
Davidon's  classical  contribution  [14],  has  been  to  account  for  the 
hessian  without  having  it  supplied;  i.e.,  making  sure  an  algorithm  would 
be  efficient  for  quadratic  functions.   On  the  other  hand,  the  new  mini- 
mization algorithm  developed  in  the  next  two  chapters  has  the  property 
of  in  effect  accounting  for  even  higher  derivatives  without  having  them 
supplied. 

The  new  algorithm  requires  that  the  function  and  the  first  two 
derivatives,  the  gradient  and  the  hessian,  be  supplied.   The  effect  of 
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the  third  and  fourth  derivatives  is  approximated  from  values  of  the 
first  two  derivatives.   The  algorithm  offers  several  new  novel  ideas 
to  the  area  of  unconstrained  minimization  such  as  a  variable  order  con- 
vergence as  high  as  four  and  a  novel  scalar  search  that  may  be  along 
curved  trajectories.   Thus  the  new  algorithm  does  not  compute  a  direc- 
tion of  search  which  is  always  a  straight  line  as  most  existing  algo- 
rithms do. 

In  circuit  optimization  procedures,  as  was  shown,  the  function  and 
the  gradient  can  be  computed  requiring  only  first  partial  derivatives 
of  the  circuit  equations  and  the  performance  function.   The  hessian 
would  require  second  partial  derivatives  of  the  circuit  equations  which 
in  general  are  very  difficult  to  derive  and  would  require  a  large  number 
of  operations  to  handle  (for  linear  circuits  the  second  partial  deriva- 
tives are  zero  and  thus  the  hessian  can  be  evaluated  in  a  straightforward 
manner  as  done  in  [54]  for  a  Newton-like  minimization  algorithm).   For 
this  reason,  in  Chapter  4  we  describe  a  difference  scheme  which  is 
built-in  the  new  algorithm  to  approximate  the  hessian,  thereby  allowing 
the  use  of  the  new  algorithm  by  supplying  it  with  only  function  and 
gradient  values. 


CHAPTER  3 

THE  VARIABLE-ORDER  ALGORITHM  FOR 
UNCONSTRAINED  MINIMIZATION 


In  this  chapter  a  new  algorithm,  called  the  Variable-Order  (VO) 
algorithm,  is  proposed  for  finding  a  solution  to  the  unconstrained 
minimization  problem 

minimize   f(x)  (3.1) 

X 

where  f  is  a  real-valued  nonlinear  function,  f:E  ->  E,  and  x  e  E  . 
The  equivalent  maximization  problem  Is  also  included  in  (3.1)  since 

maximize   f(x)  =  minimize  -   f(x) 


Solution  of  the  unconstrained  minimization  problem  is  not  only  im- 
portant in  its  own  right,  but  as  will  be  seen  in  Chapter  4,  solution 
of  the  unconstrained  problem  is  central  to  the  solution  of  the  con- 
strained minimization  problem  which  often  arises  in  computer-aided 
circuit  optimization  procedures. 

There  are  several  existing  algorithms  designed  to  solve  (3.1). 

These  algorithms  are  all  iterative,  that  is,  beginning  from  an  initial 

0  k 

estimate  of  a  solution  x  ,  a  sequence  {x  }  is  generated  which  under 
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certain  conditions  converges  to  a  solution  x  of  (3.1).   The  exact 

* 
solution  X  is  rarely  obtained  in  a  finite  number  of  iterations,  but 

if  the  sequence  has  a  high  order  of  convergence  the  solution  can  be 
approximated  closely  in  a  finite,  and  hopefully  small  number  of  iter- 
ations.  Therefore,  a  desirable  property  for  a  minimization  algorithm 
is  that  it  generates  convergent  sequences  with  a  high  order  of  con- 
vergence. 

If  an  algorithm  generates  convergent  sequences  from  any  initial 
point  X  ,  it  is  said  to  be  globally  convergent.   It  will  be  shown 
that  the  VO  algorithm  is  globally  convergent  for  pseudoconvex  [35] 
functions  that  are  twice  continuously  differentiable.   Moreover, 
numerical  experiments  indicate  that  the  VO  algorithm  is  able  to  effi- 
ciently compute  minima  of  some  functions  not  satisfying  these  condi- 
tions. 

If  an  algorithm  has  a  high  order  of  convergence,  reasonable 
accuracy  might  be  expected  when  the  algorithm  is  stopped  after  several 
iterations.   The  order  of  convergence  of  an  algorithm  may  be  defined 
by  a  value  r  such  that 


k+1 

■  ?*  II  1 

1   i< 

i; 

X     - 

-    X 

'  'h 

'\, 

where  0  <  C  <  <»  is  a  constant  called  the  convergence  ratio.   Observe 

Ilk    A  II  k+1 

that  if  the  distance   x  -  x    is  sufficiently  small,  x    will  be 

much  closer  to  x  for  large  r.  Most  algorithms  have  orders  of  con- 
vergence  equal  to  two  or  less.  For  example:  steepest  descent  con- 
verges linearly  (r  =  1),  the  conjugate  gradients  algorithm  of  Fletcher 


-22- 


and  Reeves  converges  linearly  but  with  a  smaller  convergence  ratio 
than  the  convergence  ratio  of  steepest  descent,  and  the  quasi-Newton 
algorithm  of  Davidon,  Fletcher  and  Powell  approaches  second-order 
convergence  [34].   It  will  be  shown  that  the  VO  algorithm  has  up  to 
fourth-order  convergence. 

The  definition  of  the  order  of  convergence  implies  that  the 
higher  the  order  the  faster  a   solution  is  approached  provided  that 
the  point  x  is  sufficiently  close  to  the  solution.   Thus  while  a 
high  order  of  convergence  algorithm  is  desirable  when  in  a  small 

neighborhood  of  x  ,  previous  studies  generally  indicated  that  when 

k  * 

X  is  far  from  x  a  lower  order  algorithm  was  more  efficient  [30]. 

In  fact  the  very  popular  class  of  algorithms  called  quasi-Newton 
algorithms  have  the  property  of  being  linearly  convergent  initially 
and  becoming  essentially  second-order  as  the  solution  is  approached 
[34].   The  VO  algorithm  automatically  adjusts  its  order  at  each 
iteration,  generally  selecting  the  order  which  allows  the  most  pro- 
gress towards  the  solution.   The  numerical  results  to  be  given  in 
the  next  chapter  show  that  the  VO  algorithm  is  more  efficient  than 
most  existing  algorithms. 

The  first  section  of  this  chapter  reviews  some  of  the  existing 
theory  associated  with  solution  points  of  (3.1).   The  second  section 
discusses  the  two  major  steps  of  a  minimization  algorithm:   the  trans- 
formation step,  and  the  scalar  search  step.   The  new  techniques  being 
introduced  for  the  VO  algorithm  are  compared  with  the  techniques  of 
previous  algorithms  in  this  section.   The  theoretical  derivation  of 
the  algorithm  is  presented  in  the  third  through  the  fifth  sections. 
Although  the  character  of  these  three  sections  is  theoretical,  several 
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numerical  and  practical  considerations  are  discussed.   The  sixth 
and  final  section  establishes  the  conditions  for  global  convergence 
of  the  VO  algorithm  and  its  order  of  convergence. 

3.1  Properties  of  Minima 
The  problem  to  be  solved  is  given  by 

minimize   f(x)  (3.2) 


n  * 

where  f:E  -»-  E.   Let  x  be  a  solution,  then  if 


f(x  )  <  f(x)  (3.3) 


for  all  possible  x,  the  point  x  is  called  a  global  minimum.   If 

A  it 

(3.3)  holds  in  a  small  neighborhood  about  x  ,  then  the  solution  x 
is  called  a  local  minimum. 

One  usually  would  like  to  determine  the  global  minimum  of  (3.2), 
However  one  must  in  general  be  content  with  a  local  minimum  because 
a  global  minimum  can  only  be  identified  if  all  minima  are  obtained, 
or  if  the  function  is  assuned  to  have  a  convexity  property  (in  which 
case  all  minima  are  global  [34]).   In  contrast,  local  minima  are 
identified  under  less  stringent  conditions  on  the  function.   The 
following  theorem  establishes  these  conditions. 

Local  Minimum  Theorem.   Let  f  :E  ->■  E  and  suppose  that  the  first 
derivative  f'(x),  the  gradient  of  f,  is  continuous,  and  that  the 
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second  derivative  matrix  f"(x),  the  hessian  of  f,  exists  at  x  .   If 

* 
X  is  a  local  minimum  of  f ,  then 

1)   f(x*)  =  0   , 


and 


* 
2)   f  (x  )  is  positive  semidef inite. 


Conversely,  if 


1)   f'(x*)  =  p   , 


and 


* 

2)   f  (x  )  is  positive  definite, 


* 
then  x  is  a  local  minimum  of  f. 


The  proof  is  straightforward  [34,  38],  and  it  will  not  be  repeated 
here.  Note  the  subtle  but  significant  difference  between  the  neces- 
sary and  sufficient  conditions. 

Most  local  minima  are  strict  local  minima.   A  strict  local 
minimum  solution  x  is  defined  by 


f(x  +  y)  <  f(x  )  (3.4) 


J  *  * 

for  all  y  f  0  such  that  x  +  y  is  in  some  neighborhood  about  x  . 

The  fallowing  theorem  plays  an  important  role  in  the  test  for  con- 
vergence of  the  proposed  algorithm. 
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Strict  Local  Minimum  Theorem.   Let  f  :E  ->•  E  have  a  continuous 

*  * 

hessian  in  some  neighborhood  about  x  .   Then  the  point  x  is  a  strict 

■k 

local  minimum  of  f  if  and  only  if  both  f'(x  )  =  0,  and  there  exists 
an  e  >  0  such  that  for  all  y  satisfying  0  <  ||  y  |[  <  e,  f"(x  +  y)  is 
positive  definite. 


This  theorem  is  significant  because  in  general  a  minimization  algo- 

k   — 
rithm  stops  at  a  point  x  =  x  which  lies  in  a  small  neighborhood 

*  * 

about  X  .   The  theorem  states  that  if  x   is  a  strict  local  minimum, 

the  hessian  should  be  positive  definite  at  x. 

a, 
* 
Proof:   (Sufficiency)  Assume  f'(x  )  =0  and  that  there  exists 

•k 

an  e  >  0  such  that  for  all  y  satisfying  0  <  lly  11  <  e,  f"(x  +  y)  is 
positive  definite.   From  the  Taylor  series  with  remainder  one  may 


f(x*  +  y)  =  f(x*)  +  (1/2)  /   f"(x*  +  ty)  y 


for  some  0  <  t  <  1.   Then  for  0  <  ||  y  ||  <  e 

a. 

f(x*  +  y)  -  f(x*)  =  (1/2)  y^  f"(x*  +  ty)  y  >  0   , 


* 

which  shows  x  is  a  strict  local  minimum. 

(Necessity)   Now  assume  x  is  a  strict  local  minimum.   Then  for 

arbitrary  y 


lim  (I/t)[f(x*  +  ty)  -  f(x*)]  =  f'(x*)^  y 

f._yn  "Xj  "v/  'Xi  '\i        '\j  'X, 
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by  the  definition  of  the  derivative  [38],   Assume  f'(x  )  ^   0.   Then 

*   T  * 

a  vector  y  exists  such  that  f  (x  )   y  <  0  (e.g.   y  =  -f'(x  )  ).   Thus 


for  suitably  small  t  >  0 


(l/t)[f(x  +  ty)  -  f(x  )]  <  0 


A 

which  contradicts  the  hypothesis  that  x  is  a  strict  local  minimum. 
Now  consider  a  Taylor  series  expansion  with  remainder 


fCx*  +  y)  =  f(x*)  +  (1/2)  y"^   f"(x*  +  ty)  y 


for  some  0  <    t   <    1.      Then  there   exists  a   5   >   0  such   that   for  all  y 
satisfying  0<    ||y||<5<e 


^1/2)    I     f"(x*  +  t;^)   ;;:  =   f (X*  +  ^)    -   f (x*)    >   0 


which  implies  the  positive  definiteness  of  the  hessian  in  a  small 
neighborhood  about  x  not  necessarily  including  x  itself.   This 
completes  the  proof. 

If  the  hessian  is  either  inaccurate  or  not  supplied,  a  widely 
used  test  to  stop  an  algorithm  is  at  a  point  x  for  which 


f'Wil  1^3   .  (3.5) 


for  some  small  e  >  0,  and 
s 


f(x  +  te  )  >  f(x)   ,     1=1,  ...,  n.  (3.6a) 


-27- 


and 


f(x  -  te  )  >  f(x)   ,     1  =  1,  ...,  n,  (3.6b) 


where  e  ,  ...,  e  are  the  unit  coordinate  vectors,  and  t  >  0  is  some 
small  scalar.   The  tests  in  (3.6)  insure  that  the  function  does  not 
decrease  in  value  along  any  of  the  coordinate  lines.   However,  as  the 
following  simple  example  [34]  shows,  these  tests  are  not  sufficient 
for  X  to  be  an  approximation  to  a  local  minimum  solution.   Consider 
the  problem 


3    2       2 

minimize   f(x)   =  x,  -  x  x.  +  2x„ 

a.       ill  2. 


The  point  (x  ,  x„)  =  (x  ,  x„)  =  (6,  9)  satisfies 


[f  (x)],  =  14"  =  '^A  -   2'<iXo  =  0   . 


and 


crw^a-Hr'-^  +  s-" 


Furthermore  with  x.  fixed  at  x  =6,  the  expression 


f(6,  9+t)  >  f(6,  9) 


is  satisfied  for  all  t,  and  with  x.  fixed  at  x  =  9,  the  expression 


-28- 


f(6+t,    9)    >  f(6,    9) 


is  satisfied  for  t  >^  -9.  Therefore  both  (3.5)  and  (3.6)  should  be 
satisfied.  Despite  this  fact,  the  point  (x  ,  x  )  is  not  a  local  mi 
mum  since   the  hessian  evaluated   at   this   point,   given  by 


f"(x,  ,    xj 
a.        1        2 


18 


-12 


-12 


is  not  positive  semldef inite  since  its  determinant  is  negative. 
Therefore,  any  algorithm  using  (3,5)  and  (3.6)  as  its  sole  termina- 
tion test  cannot  guarantee  the  computation  of  a  local  minimum. 

3.2  Principal  Steps  in  a  Minimization  Algorithm 

A  minimization  algorithm  Iteratively  generates  a  sequence  of 

k  0 

points  {x  }  starting  from  some  initial  point  x  which  hopefully  con- 

verges  to  a  solution  of  (3.2).   It  is  convenient  to  view  each  itera- 
tion of  the  algorithm  in  terms  of  the  expression 


k+1   „k,     k, 
x    =  H  (p.  ,  X  ) 


(3.7) 


where  H  is  called  the  iteration  function,  which  is  a  function  of  a 

th  k 

scalar  parameter  p,  ,  the  k —  estimate  of  the  solution  x  ,  the  function 

k  k 

value  at  x  ,  and  possibly  derivatives  of  f  at  x  .   The  iteration  in- 

dicated  by  (3.7)  may  be  separated  into  two  steps.   The  first  step 

consists  of  computing  the  transformation  function  given  by 
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y  =  h'^Cp)  =  Ap,  x^   ,  (3.8) 


k      n 
where  for  convenience  the  function  h  :E  ->■  E  has  been  defined.   This 

step  will  be  called  the  transformation  step.   Its  purpose  is  to  com- 
ic k 
pute  a  direction  or  trajectory  from  x  ,  using  f(x  )  and  possibly 

k  k+1 

derivatives  of  f  at  x  ,  along  which  the  next  point  x    will  be 

selected.   The  transformation  function  represents  this  trajectory, 

and  p  is  a  scalar  parameter  which  is  proportional  to  how  far  along 

k+1 
this  trajectory  will  the  next  point  x    be.   The  second  step  con- 
sists of  selecting  or  computing  a  suitable  value  of  P=p,  such  that 

k+1  .    . 
X    is  given  by 


,k+l  ^  ^k  ^  j^k^p^^   ■  (3.9) 


This  step  is  called  the  scalar  search  step. 

In  most  existing  algorithms  [34],  the  transformation  functions 
are  linear  in  the  scalar  parameter  p  and  have  the  form 


h^ip)   =  x''  +  p  d^   ,  (3.10) 


1^ 
where  d  is  called  the  direction  of  search.   For  example,  in  the 

k        k 
steepest  descent  algorithm  [34],  d  =  -f'(x  ).   In  contrast,  the 

transformation  functions  for  the  VO  algorithm  are  polynomials  in 

p,  of  degree  up  to  three.   These  polynomials  follow  inherently  from 

the  derivation  of  the  transformation  functions  for  the  VO  algorithm 

to  be  given  in  the  next  section.   The  transformation  functions  for  the 
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new  algorithm  may  thus  yield  curved  trajectories  of  search  instead 
of  straight-line  directions  of  search. 

If  the  scalar  search  step  computes  a  p=p,  such  that 


f(x  ')   =  f(f  (p,))  <  f(/)  (3.11) 


is  satisfied,  then  each  iteration  of  the  algorithm  will  progress 
towards  a  solution  of  (3.2).   The  satisfaction  of  (3.11)  at  each 
Iteration  insures  that  the  sequence  {f(x  )}  is  monotonically  de- 
creasing,  a  property  that  is  generally  required  to  establish  the 
global  convergence  of  an  algorithm.   Thus  most  algorithms,  including 
the  VO  algorithm,  compute  a  p,  which  satisfies  (3.11).   If  a  p. 
cannot  be  found  to  satisfy  (3.11)  an  algorithm  has  either  converged 
to  a  solution  of  (3.2),  or  it  has  failed.   The  following  lemma,  a 
generalization  of  an  existing  result  [56],  establishes  sufficient 
conditions  for  the  existence  of  a  p=p,  to  satisfy  (3.11). 


k   k 
Lemma  3.1  Assume  f(x)  is  dif ferentiable  at  x=x  ,  h  (p)  is 

k     k  k 

dif ferentiable  at  p=0,  and  h  (0)=x  .   Define  h  ' (p)  to  be  the  first 

k 
derivative  of  h  with  respect  to  p.   Then  if 


f'Cx'")^  h'^'CO)  <  0   .  (3.12) 


there  exists  a  p  >  0  such  that 


f(h^(p))  <  f(x^)   .  (3.13) 
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Proof :   Using  chain  differentiation  and  the  definition  of  deri- 
vative one  may  write 


lim  (l/p)Cf(h'^(p))  -  f(h^(0))]  =  f'Ch'^CO))^  h^'CO)   .    (3.14) 
p-K)        ^         '^        -v.  n,       'v. 


k       k 
Since  h  (0)  =  x  ,  using  (3.12),  this  expression  becomes 


lim  (l/p)[f(h^(p))  -  f(x^]  <  0   .  (3.15) 

p->0        '^         "^ 


Then  there  exists  an  c  >  0  such  that  for  p  ?*  0  and  -e  <  p  <  e 


(l/p)[f(h''(p))  -  fCx^]  <  0   .  (3.16) 


Select  0  <  p  <  E  to  preserve  the  inequality  and  it  follows  that 


f(h^(p))  <  f(xS   , 


which  completes  the  proof. 

For  the  transformation  function  (3.10)  found  in  most  algorithms,  (3.12) 
takes  the  following  form 


f'dxV   d*"  <  0   ,  (3.17) 


which  becomes 
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for  the  steepest  descent  algorithm  [34].   Thus  as  long  as  f'(x  )  ?^  0, 
the  steepest  descent  algorithm  should  be  able  to  obtain  a  decrease  in 
the  function.   As  will  be  seen,  the  VO  algorithm  also  has  the  property 
of  satisfying  (3.12)  whenever  f'(x  )  ^0.   That  this  property  is 
highly  desirable  follows  from  the  theorems  given  in  the  preceding 
section. 

Assuming  the  existence  of  a  p  which  satisfies  (3.11),  the  problem 

to  be  briefly  considered  now  is  the  computation  of  a  particular  value 

k-fl 
p,  to  be  used  in  obtaining  x   .   There  are  two  stages  to  this  problem. 

First,  the  desired  p  must  be  defined  in  some  concrete  manner,  usually 

as  the  solution  of  a  scalar  problem.   In  most  existing  algorithms, 

the  desired  p,  is  defined  as  the  solution  of  the  following  scalar 

minimization  problem  [34] 


fCh'^Cp,  ))=  minimize   f(h^(p))   .  (3.19) 


This  value  of  p  should  satisfy  (3.11),  and  thus  this  scalar  problem 
defines  the  desired  p,  in  a  suitable  manner.   Other  existing  algo- 
rithms, such  as  the  Davidon  [14]  algorithm  or  its  more  popular  modi- 
fication due  to  Fletcher  and  Powell  [22],  have  the  requirement  of 
defining  p,  by  problem  (3.19).   VThile  it  may  be  argued  that  p,  defined 
by  (3.19)  provides  the  most  decrease  in  the  function  at  the  k   iter- 
ation, this  Pj^  may  not  be  the  best  one  in  the  sense  of  minimizing  the 
overall  number  of  iterations.   For  example,  when  x  is  far  from  a 
solution  X  of  (3.2),  the  p,  given  by  (3.19)  tends  to  force  all  future 
iterations  to  follow  the  bottom  of  narrow  valleys  with  slow  progress 
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towards  x  [30].   Thus  Ideally  p,  should  be  defined  by  (3.19)  when- 

k  *  k 

ever  x  is  close  to  x  ,  and  in  some  other  manner  whenever  x  is  far 

* 
from  X  . 

Secondly,  once  the  problem  which  defines  the  desired  p,  has  been 
given,  an  approximation  to  the  solution  of  the  problem  must  be  effi- 
ciently computed.   It  is  important  that  the  overall  algorithm  not  fail 
if  rough  approximations  to  the  solution  are  computed  to  achieve  savings 
of  computer  time.   For  example,  many  studies  have  indicated  that  the 
overall  efficiency  of  the  popular  algorithm  due  to  Fletcher  and 
Powell  and  the  one  due  to  Fletcher  and  Reeves  (both  of  which  theore- 
tically require  the  solution  of  (3.19))  are  sensitive  to  the  accuracy 
of  the  approximate  solution  of  (3.19)  [34,46].   The  scalar  search  for 
the  VO  algorithm  was  developed  under  these  considerations.   The  de- 
tails are  given  in  Section  3.4. 

3.3  Variable-Order  Transformations 

In  this  section  the  transformation  functions  for  the  VO  algo- 
rithm are  derived.   It  will  be  seen  that  these  transformation  func- 
tions require  evaluation  of  higher-order  derivatives.   However, 
approximations  are  possible  which  allow  the  algorithm  to  be  used  even 
vhen  only  function  values  are  supplied. 

The  motivation  for  the  variable-order  transformations  stems  from 

a  desire  to  approximate  the  behavior  of  the  gradient  of  f  with  infor- 

]/^ 
raation  at  the  present  point  x  .   It  will  be  shown  that  it  is  possible 

* 
to  represent  the  point  x  ,  at  which  the  gradient  is  zero,  by  an 

infinite  series.   The  variable-order  transformation  functions  result 
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from  truncations  of  this  series.   A  para'-ister  p  appearing  in  the 
infinite  series  in  effect  accounts  for  the  terms  of  the  series  which 

are  dropped. 

* 

We  begin  by  noting  that  a  necessary  condition  for  x  to  be  a 

solution  of  the  minimization  problem  (3.2)  is  that  the  gradient  of  f 

A 

at  X  be  zero.   Then  any  solution  of  (3.2)  must  satisfy  the  system 
a. 

of  equations 


f'(x)  =  0   .  (3.20) 


Now  consider  the  change  of  variables  denoted  by 


X  =  X(z)   ,  (3.21) 


where  X  may  be  a  nonlinear  function.   Using  (3.21),  the  equation 
(3.20)  becomes 

f'(x)  =  f(X(z))  =  g(z)   .  (3.22) 


Define  a  z  such  that 


X*  =  X(z  )   ,  (3.23) 


then  from  (3.22) 


g(z*)  =  0   ,  (3.24) 

A  * 

since  f'(x  )  =  0.   If  the  function  g  is  simple  to  invert  so  that  z 
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may  be  found  from  (3.24)  by 


z*  =  g  ^0)   ,  (3.25) 


* 
then  the  solution  x  of  (3.20)  may  be  found  from  (3.23).   Clearly 

specifying  a  change  of  variables  which  yields  a  function  g  which  is 

simple  to  invert  could  be  difficult.   However  as  we  now  show,  we  can 

start  by  specifying  a  suitable  function  g  and  determine  the  resulting 

change  of  variable  function  X.   To  this  end  assume  some  g  function 

has  been  specified.   Then  X(z)  may  be  expanded  in  a  Taylor  series 

about  some  z=z  ,  so  that  (3.21)  becomes 

X  =  X(zS  +  X'(z^)(z-z'')  +  (l/2)[X"(z^)(z-zh](z-z^  +  ..., 


'V.li'li       %       '\j  '\j   "o  Ii'TjOi'V      Oil/ 


k  k 

where  from  (3.21),  x  may  be  associated  with  z  by 


(3.26) 


XCz*^)  =  x*^  ,  (3.27) 


'\j   %  Oi 


and 


X'(zS  =  f"(x^)~^   g'(z^)   .  (3.28) 


'\/  Oj     %  Tj     a<  '\/ 


X"(z^)  =  f"(x^"^Cg"(z'')  -  [f'"(x'')  X'(zS]  X'(2'')]   ,   (3.29) 


are  obtained  by  repeated  differentiation  of  (3.22)  and  evaluating  the 

k        k 
resulting  expressions  at  x=x  ,  and  z=z  .   Since,  as  it  is  the  case 

'Xi    %  'X)    % 
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k  k 

for  the  present,  an  x  is  known  or  given,  a  corresponding  z  may  be 

found  from  (3.22)  by 


z  =  g   (f'(x  ))   ,  (3.30) 


since  it  was  assumed  that  g  is  simple  to  invert.   Therefore  all  the 

a, 

terms  of  the  series  have  been  defined  assuming  all  the  derivatives 

and  the  inverse  of  the  hessian  exist.   Finally,  since  we  are  interested 

*  A 

in  X  given  by  X(2  )  in  (3.23),  we  obtain  the  infinite  series  repre- 

sentation  of  a  solution  to  (3.20)  given  by 


X*  =  x^  +  X'(z^(z*-z^  +  (l/2)[X"(z^(z*-z'')](z*-z^  +  .... 

(3.31) 

*  k 

where  z  is  given  by  (3.25),  z  by  (3.30),  and  the  derivatives 

k       k 
X'(z  ),  X"(z  ),  ...  are  obtained  by  differentiation  as  shown  in 

(3.28)  and  (3.29). 

We  now  turn  to  the  selection  of  a  suitable  function  g.   Since 

the  function  g  should  be  simple  to  invert,  a  logical  choice  might  be 


I'^l^   =  ^^1'  ^2 ^>^   •  (3.32) 


for  which  the  function  and  its  inverse  are  identical.   For  this  func- 
tion, the  infinite  series  (3.31)  becomes 


*    k    .k   ,k   ,k  ,^  ^^  , 

X  =  X  -  d  -  d  -  d  -  ....  (3.33a) 


where 
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d^  =  f"(xS~^  f(xS   ,  (3.33b) 


d^  =  (1/2)  f"(x^~^[f"'(x^  d^]  d^   .  (3.33c) 


d)  =   f"(xS  ^[[f"'(x^dhd^-(l/6)[[f^''(x^d^]d^]d^]  .   (3.33d) 


This  infinite  series  is  a  well-known  result  extended  to  n-dimensicns. 
If  this  series  is  truncated  to  two  terms,  an  iterative  method  can  be 
constructed  given  by 


,•^+1  =  x^  -  d^   ,  (3.34) 


which  is  Newton's  iteration  for  solving  (3.20)  [34].   However,  this 
iterative  method  (or  iterative  methods  obtained  from  any  truncations 
of  (3.33))  may  not  converge  to  a  solution  of  the  minimization  problem 
(3.2),  because  the  infinite  series  (3.33)  may  not  itself  be  a  con- 
vergent series.   Thus  we  conclude  that  the  series  resulting  from  the 

simple  function  g  given  by  (3.32)  does  not  in  general  produce  a 
a, 

suitable  infinite  series. 

One  of  the  proposed  modifications  to  (3.34)  which  improves  its 
potential  for  convergence  is  the  introduction  of  a  scalar  p  to  "damp" 
the  iteration  [34]  given  by 


In  1755,  Euler  derived  an  infinite  series  for  a  solution  of  the 
scalar  problem  f(x)  =  0  [19].   Other  recent  derivations  and  more 
extensive  studies  of  this  scalar  series  have  been  published  [47, 
39,50,32,42].   The  Euler  series  becomes  (3.33)  when  extended  to 
n-dimensions  for  the  solution  of  problem  (3.20). 
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k+1    k      ,k 
X    =  X  -  p,  d 


For  the  minimization  problem  (3.2),  this  iteration  becomes  the  two- 
step  procedure  given  by  the  computation  of  the  transformation  function 


h(p)  =  X  -  p  d   , 


k+1 
and  the  computation  of  a  suitable  value  p==p,  to  obtain  x    given  by 


X    =  h(p,  ) 


Motivated  by  this  modification,  we  propose  the  following  function 


g(z)  =  (zP   zP   ...,  zP)'^   .  (3.35) 

K  <\,  12        n 


Note  that  this  g  function  is  simple  to  invert.   Furthermore, 


:*  =  2~^0)  =  0   ,  (3.36) 


and  any  higher  derivatives  with  respect  to  z  are  readily  obtained. 
Taking  the  first  two  terms  of  the  resulting  series  (3.31)  yields  the 
iterative  method 


c'^+l  =  x^  -  p,  f"(x'^)"^  f'(x^   .  (3.37) 


Thus  the  second-order  transformation  function  may  be  defined  by 


#  P  .   ,     ,    th 

Note  that  z,   means  z.  raised  to  the  p   power. 
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t2^^-> 

k 

=  X 

(3.38) 

where 

'XjZ        'x. 

(3.39) 

is  def 

ined  to 

be  the  second- 

order 

correction. 

(Note 

that 

the 

order 

refers 

to  the 

order  of  convergence  of  the  sequences 

which 

are 

generated 

by  the 

algorithm. ) 

Thus  Newton' 

s 

method 

for 

solving 

minimization 

problems  falls 

!  out 

as  a  spec 

ial 

case  of 

the  p 

reposed 

algorithm. 

Taking 

three  terms 

of  (3.31) 

yie 

Ids  the 

third 

-order 

transformation 

given  by 

where 


h^(p)  =  x^  -  (1/2) (3  -  p)p  d^  -  p^  d^   ,  (3.40) 


d^  =  (1/2)  f"(x^~^  Cf"  (x"^)  dh  d^  (3.41) 


is  the  third-order  correction.   Similarly,  using  the  first  four  terms 
of  (3.31)  yields  the  fourth-order  transformation  given  by 

h^(p)  =  x^  -  (l/6)(p^-6p+ll)p  d^  -  (2-p)p^  d^  -  P^  d^  . 

(3.42) 

where  the  fourth-order  correction  is  defined  to  be 


d^  =  f(xS  ^[[f-  (x^d^]d^-(l/6)[[f^''(x'')d^]d^d^]  .  (3.43) 
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Transformation  functions  of  order  higher  than  four  may  be  similarly 
derived.   However,  as  will  be  seen  below,  it  does  not  seem  possible 
to  adequately  approximate  the  corrections  of  order  greater  than  four. 
Moreover,  higher-order  derivatives  require  considerable  storage  and 
they  are  very  difficult  to  derive  in  general,  and  therefore  we  try 
to  avoid  them.   Additionally,  the  techniques  proposed  in  Section  3. A 
for  the  scalar  search  would  not  be  as  attractive  for  transformation 
functions  of  order  higher  than  four  because  the  zeros  of  polynomials 
of  degree  greater  than  two  would  be  needed.   Finally,  it  was  experi- 
mentally found  that  for  one  function  tested,  transformation  functions 
of  order  higher  than  four  did  not  increase  overall  efficiency  in 
computing  the  solution.   The  selection  of  which  transformation  func- 
tion to  use  at  each  iteration  will  be  described  later. 

3.3.1  Approximations  of  Higher-Order  Corrections 

Observe  that  the  computation  of  the  third-order  correction  (3.41) 

would  require  both  the  evaluation  of  f"  (x  ) ,  a  third-order  tensor, 

a.   a, 

and  a  considerable  number  of  multiplications.   This  computational 
effort  can  be  reduced  by  using  the  approximation 


d^  ^  f'ix^)    ^   f'(h^(l))   ,  (3.44) 


where 


h^(l)  =  x^  -  d^  (3.45) 


from  (3.38).   The  approximation  (3.44)  follows  from  the  Taylor  series 
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expansion 


f  (x^-db   =   f(x'')-f"(x^)d^  +   (l/2)[f"'  (x*")    d^]  d^  -    ...      . 


(3.46) 


Using   (3.39),    the   first  two   terms  cancel,   and   therefore 


f'Ch^d))  =  v(J^  -  db  :};  (1/2)  [f"  (x^  d^]  d^  (3.47) 


II  k  m3 
Is  an  approximation  with  error  on  the  order  of  ||  d  |  ,  assuming  the 

fourth  derivative  of  f  Is  bounded.   Comparing  the  equation  for  the 
third-order  correction  given  by  (3.41),  using  (3.47)  yields  the  approx- 
imation (3.44).   Similarly,  the  fourth-order  correction  (3.43)  may  be 
approximated  by 


d^  -k.   f"(x^  ^  f(h^(l))   ,  (3.48) 


where 


Ji3(l>  =  ^'  -  ^2  -  ^3  (3.49) 


1^ 
from  (3.40).   Using  the  approximation  for  d„  given  by  (3.44),  the 

approximation  (3.48)  has  error  of  order  (||d_  ||  +  ||d_  ||)   assuming 

the  fourth  derivative  of  f  is  bounded.   The  approximation  and  the 

k   k   k 
error  bound  follow  from  the  Taylor  series  expansion  of  f'(x  -d„-d„) 

and  the  use  of  (3.39)  and  (3.44).   The  error  in  these  approximations 

continues  to  increase  for  corrections  of  higher  order.   Furthermore, 

there  errors  are  enlarged  since  these  higher-order  corrections  are 
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multiplied  by  increasing  powers  of  p  as  can  be  seen  from  the  third- 
(3.40)  and  fourth- (3. 42)  order  transformations.   This  error  is  the 
major  reason  for  considering  only  transformations  of  order  four  or 
less. 

In  the  next  chapter,  approximations  to  the  hessian  and  the  gra- 
dient of  f  will  be  presented.   These  approximations  will  allow  the 
algorithm  to  be  used  even  when  only  function  values  are  supplied. 

3.3.2  Transformation  Order  Selection 

Assuming  that  all  of  the  transformation  functions  exist  (this 
assumption  Is  removed  later  in  this  chapter) ,  we  wish  to  consider  the 
question  of  which  one  should  be  used  in  each  iteration. 

Recall  that  each  transformation  function  may  be  thought  of  as  a 
curved  trajectory  passing  through  x  ,  with  the  scalar  parameter  p 
proportional  to  how  far  from  x  the  next  point  might  be.   Ideally  the 

best  transformation  function  order  to  use  is  the  one  whose  trajectory 

* 
passes  the  closest  to  x  .   It  might  seem  that  the  higher  the  order, 

* 
the  closer  to  the  solution  x  ,  since  more  terms  are  taken  in  the 

infinite  series  representation  of  x  ,   However  there  are  two  reasons 

why  this  seemingly  reasonable  expectation  is  not  usually  true.   First, 

the  process  of  computing  the  terras  of  the  transformation  functions 

involves  several  approximations  and  many  arithmetic  operations  with 

ensuing  errors.   Second  and  perhaps  more  importantly,  the  infinite 

* 
series  represents  x  only  if  it  converges;  it  must  also  converge  very 

fast.   It  was  indeed  verified  numerically  that  usually  one  of  the 

transformation  functions  is  better,  in  the  sense  of  giving  trajec- 

tories  closer  to  x  ,  than  the  others  at  each  Iteration,  and  the  best 
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one  is  not  necessarily  the  one  with  the  highest  order. 

The  proposed  technique  for  selecting  the  best  transformation 
function  is  based  on  the  convergence  of  the  infinite  series  (3.33) 
to  a  solution  of  the  minimization  problem  (3.2).   Recall  that  this 
infinite  series  resulted  from  the  g  function  (3.23),  or  for  the  func- 
tion which  was  eventually  used  given  by  (3.35)  with  the  scalar  para- 
meter p  set  to  one.   The  procedure  may  be  described  as  follows. 
Select  the  second-order  transformation  if 


f(h^(l))  =  f(x^  -  dh    >   f(xS  =  f(h^(0))   ,  (3.50) 


otherwise  select  the  (r-1)   order  transformation  if 


f(h!^(l))  >  f(h^  ,(1))  ,   r  =  3  and  A   .  (3.51) 


If  (3.51)  is  not  satisfied  for  r  =  4,  the  fourth-order  transformation 
is  selected.   Thus  when  orders  three  or  four  are  selected,  a  value 
of  p  =  1  always  gives  a  point  which  yields  a  function  value  less  than 
the  present  value.   It  was  experimentally  verified  that  this  method 
selected  the  best  order  in  most  iterations.   In  those  very  few  itera- 
tions where  it  did  not  select  the  best  order,  the  order  selected  was 
only  insignificantly  different  than  the  best  one. 

Once  one  of  the  three  transformation  functions  is  selected,  the 
dimension  of  the  problem  has  been  reduced  to  one.   To  see  this, note 
that  at  the  k   iteration  the  problem  remaining  is  to  compute  a  value 
of  the  scalar  parameter  p  such  that 
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fChJjCp))  <  f(x^  =  f(h^(0))  ,   r  =  2,  3,  or  4  ,       (3.52) 


where  the  transformation  functions  were  given  earlier,  but  will  be 
repeated  here  for  ease  of  reference.   Thus 


h  (p)  =  X  -  p  d   ,  (3.53a) 


h^Cp)  =  {  -   (3/2)d^  p  -  U^  -  (l/2)d^]  p2   ,  (3.33b) 


h,(p)  =  f-  (n/6)d^  p-  (^S-P  0  -^il'i>^''»ili  p'  . 

(3.53c) 

k   k 
where  the  terms  have  been  rearranged  in  powers  of  p,  and  d„,  d  ,  and 

^2     'u3 

d^  are  given  by  (3.39),  (3.44),  and  (3.48)  respectively.   The  com- 
putation of  a  suitable  value  of  p  is  described  in  the  next  section. 


3.4   The  Scalar  Search 

The  scalar  search  for  an  appropriate  value  of  the  parameter  p 
which  appears  in  the  higher-order  transformation  functions  is  often 
the  most  time-consuming  step  in  a  minimization  algorithm.   The  source 
of  the  difficulty  is  the  requirement  that  most  existing  algorithms 
have  of  computing  a  value  of  p  that  accurately  solves  a  scalar  mini- 
mization problem  to  be  described  below.   In  this  section  it  will  be 
shown  that  the  scalar  search  step  for  the  VO  algorithm  is  not  time- 
consuming  due  to  inherent  properties  of  the  transformation  functions. 
Furthermore,  during  the  initial  iterations  of  the  VO  algorithm,  when 
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k  * 

the  estimate  x  of  the  solution  x  is  far  from  the  solution,  the 

desired  value  of  p  is  defined  to  be  the  solution  of  a  scalar  problem 

not  used  in  any  previously  reported  algorithm. 


that 


Recall  that  at  this  stage  we  wish  to  find  a  value  of  p=p,  such 


'         =  J»^Pk>'    r  =  2,  3,  or  4   ,  (3.54) 


where  h  is  one  of  the  transformation  functions  given  in  (3.53).   If 

k         * 
the  current  point  x  is  not  x  ,  a  solution  of  the  minimization  pro- 

blem  (3.2),  then  the  scalar  search  should  select  a  value  of  p  such 

that  the  descent  condition 


f(h5^(p))  <  f(x^)   ,  (3.55) 


k  * 

is  satisfied.   If  x  is  equal  to  x  the  scalar  search  is  unnecessary. 

Normally  there  is  an  infinite  number  of  values  of  p  for  which 

(3.55)  is  satisfied.   The  best  value  of  p  to  choose  would  be  the  one 

* 

that  minimizes  the  total  number  of  iterations  to  approximate  x  . 

However,  the  computation  of  this  value  of  p  is  impossible  for  general 
problems  since  it  requires  information  from  future  iterations.   Some 
of  the  considerations  which  lead  to  approximations  of  the  best  value 
of  p  are  given  next. 

Assume  a  p,  =  p  exists  such  that  from  (3.54)  we  obtain 


k+l    *   ,  k ,  m, 
X    =  X  =  h  (p.  ) 
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Clearly  the  best  value  of  p  in  this  case  would  be  p,  ,  and  p!"  would 


be  defined  by 


fChNpI"))  =  minimize   f(h^(p))   ,  (3.56) 

P 


which  is  a  scalar  minimization  problem.   Most  existing  algorithms 
choose  p,  to  be  an  approximation  to  p  at  each  iteration;  some  algo- 
rithms theoretically  require  that  p  be  an  accurate  approximation  to 

P,  [46],  in  contrast  with  the  VO  algorithm  which  has  no  such  require- 

..   T      ..  •    1^+1   t^k,  m.  .    ,  ,       ,     * 
ment.   In  practice  x    =  h  (p,  )  is  seldom  equal  to  x  .   While  it 
'\i  o-r   k  a< 

may  be  convincingly  argued  that  p,  =  p   is  an  optimal  value  for  some 

k  A 

iterations,  particularly  when  x  is  in  some  neighborhood  of  x  ,  the 

best  value  of  p  to  choose  is  not  p  for  most  iterations.   In  fact, 

k.  *  m 

when  X  is  far  from  x  ,  choosing  p,  =  p   tends  to  force  any  minimi- 
zation algorithm  to  follow  the  bottom  of  narrow  valleys  with  typically 

* 
slow  progress  towards  x  [30].   Therefore,  an  ideal  scheme  would 

choose  p  =  p  when  x  is  in  some  neighborhood  of  x  ,  and  would 

choose  p   to  stay  away  from  the  bottom  of  narrow  valleys  whenever 

k  * 

X  is  far  from  x  . 

The  scalar  search  of  the  VO  algorithm  first  attempts  to  establish 

.   ,    k  *       k  * 

whether  x  is  close  to  x  .   If  x  is  close  to  x  ,  then  p,  is  set  to 
'V/  %  'x,  'u         k 

an  approximation  of  p,  ,  a  solution  of  the  scalar  minimization  prob- 

k  * 

lem  (3.56).   The  details  are  described  below.   If  x  is  far  from  x  , 

k+1    k 
then  p,  is  computed  under  the  principle  that  x    =  h  (p,  )  should  be 
K  Tj      i^r  k 

1^ 
as  far  away  from  x  as  possible.   The  method  for  computing  p, ,  when 

k  * 

X  is  far  from  x  ,  will  be  described  in  two  parts:   1)  if  the  second- 
order  transformation  was  selected  (r  =  2),  and  2)  if  the  third-or 
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the  fourth-order   transformation  was  selected    (r  =  3  or  4) 


A 

3.4.1   Iterations  Close  to  x 


If  X  is  close  to  X  ,  then  l|  f'(h  (1))  11  will  be  small  due  to 
'\i  ^  '\/  \>r 

the  manner  in  which  the  transformation  functions  h  were  derived. 
Thus,  if 


f'(^lr^l>)  II  l^c   '  (3.57) 


for  some  e  >  0,  it  is  concluded  that  the  choice  p,  =  p,  should  be 
c  ^k   "^k 

made,   (For  the  tested  functions,  which  are  not  badly  scaled, 

e   =   1   was    reasonable.)   The  test  (3.57)  can  be  made  without 
c 

further  gradient  evaluations  since  it  was  shown  in  Section  3.3.1 

k  k 

that  the  two  gradients  f'(h.(l)),  and  f'(h.(l))  are  evaluated  while 

computing  the  approximations  to  the  third- and  fourth-order  correc- 
tions given  in  (3.48).   If  the  fourth-order  transformation  is 
selected,  it  was  found  that  it  is  not  necessary  to  evaluate  f'(h,  (1)), 
but  rather  the  results  of  the  test  for  f'(h„(l))  could  be  used 

instead. 

k  * 

Having  identified  that  x  is  close  to  x  in  the  above  manner, 

an  approximation  to  p,  needs  to  be  computed.   The  following  procedure 

was  satisfactory  for  the  functions  tested.   Evaluate  f(h  <'p))  for 

o-r 


p  =  2,  3,  ....  L-1,  L,  L+1,  until 


f(hNL-l))  >  f(h^(L))  <  f(h^(L+l))   ,  (3.58) 

Oir  Oir     —  "XjV 
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which  is  a  standard  method  of  bracketing  the  scalar  minimum  [30]. 
The  minimum  of  the  quadratic  polynomial  in  p  which  passes  through  the 
three  points  [30]  obtained  in  (3.58)  is  computed  given  by 


f(}i^a-l))  -  4f(hNL))  +  3f(hNL+l)) 

£  =  L-Y 1 1 p .      (3.59) 

f(h^(L-l))  -  2f(h^(L))  +  f(h^(L+l)) 


If  £  is  close  to  L  ( |£  -  L |  £  -02),  set  p  =  L  to  complete  the 
scalar  search  for  this  iteration.   If  £  is  not  close  to  L,  then 
f(h^(p))  is  evaluated,  and  if  f(h^(p))  <  f(h^(L)),  set  p  =  £, 
otherwise  set  p  =  L.   This  completes  the  scalar  search  for  the  case 
when  X  is  close  to  x  . 

For  most  local  minima  p  =  1  and  the  above  procedure  should  yield 
P,  =  1  requiring  only  one  additional  function  evaluation.   However,  for 
local  minima  with  a  positive  semidefinite  hessian,  p,  will  generally 
be  greater  than  one.   Therefore,  in  actual  implementation  shown  in 
Appendix  I,  if  the  minimum  is  located  for  p  >  4,  the  function  is 
evaluated  at  p  =  10,  22,  46,  94,  190,  ...,  etc.,  until  the  minimum  is 
bracketed. 


3.4.2  Iterations  Far  from  x 


k  * 

When  X  is  far  from  x  the  expression  (3.57)  is  not  satisfied. 

k+1 
In  this  case,  the  basic  principle  to  be  proposed  is  that  x    should 

be  as  far  away  from  x  as  possible,  subject  to  satisfying  (3.55). 

This  principle  defines  the  desired  p,  to  be  a  solution  of 
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maximize       ||  h'^(p)   _  x^  ||       ^  (3.60a) 

p 


subject  to       f(h'^(p))  <  f(x^)   -  c,       ,  (3.60b) 


k+1 
where  c  >  0  will  be  defined  to  insure  that  f (x   )  is  sufficiently 

less  than  f(x  ).  An  accurate  solution  of  (3.60)  would  be  difficult 
to  obtain  computationally.   However,  first  an  accurate  solution  is 
not  required,  and  second  when  the  third- or  fourth-order  transformation 
is  selected,  trial  values  of  p  that  may  approximate  a  solution  of 
(3.60)  may  be  found  from  already  available  information.   The  pro- 
cedure, if  the  second-order  transformation  is  selected,  is  described 
first. 

Second-Order  Transformation  Selected.   If  the  second-order 
transformation  is  selected  at  the  k   iteration,  the  search  for  an 
approximation  to  a  solution  of  (3.60)  is  along  a  straight  line  in 
the  space  of  the  independent  variables,  since  h„(p)  is  a  linear  func- 
tion  of  p;  thus  (3.60a)  is  linear  in  p  and  it  is  maximized  by  the 
largest  possible  value  of  p.   In  this  case  c,  is  set  to  zero,  which 
implies  that  we  desire  any  descent.   The  procedure  proposed  may  be 
generally  described  as  fitting,  and  computing  the  minimum  of,  approx- 
imating polynomials,  which  attempts  to  satisfy  the  descent  constraint 
(3.60b).   Then  attempting  to  satisfy  (3.60a),  a  constant  is  added  to 
the  computed  minimum  of  the  approximating  polynomial.   The  details 
are  given  next. 

The  following  information  is  already  available:   f(h„(0)). 
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f'Ch^CO)),  f(h2(l)),  and  f'Ch^d)).   Moreover, 


f'(h>))"h^'(0)  <0   .  (3.61) 


where  h^'CO)  is  the  first  derivative  of  h  (p)  with  respect  to  p, 
evaluated  at  p = 0.  Expression  (3.61)  implies  the  existence  of  a 
p  >  0  that  satisfies  (3.60b)  (see  Lemma  3.1).   If  the  expression 


f(h2Cl))  <  f(h^(0)  =  f(xS   ,  (3.62) 


is  satisfied,  then  (3.60b)  is  also  satisfied.   Whenever  (3.62)  is 
satisfied,  it  is  computationally  efficient  to  select  p.  =  1  since  the 
function  and  the  gradient  are  already  evaluated  for  the  next  itera- 
tion.  In  addition,  it  is  unlikely  that  the  descent  constraint  (3.60b) 
will  be  satisfied  for  P.  >  1  because  of  the  manner  in  which  the 
transformation  function  order  is  selected.   Thus  whenever  (3.62)  is 
satisfied,  p  is  set  to  one. 

If  (3.62)  is  not  satisfied,  the  minimum  within  the  interval  (0,  1) 
of  the  cubic  Hermite  polynomial  in  p  fitted  through  the  available 
information  is  computed  as  follows  [14] 

p  =  I  -  (s  +  a  -  b)/(s  -  s  +  2  a)   ,  (3.63a) 


where 


'l  =  V^i^'^^^i'^'^      '  <3.63b) 
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^0  =  r(^>)^^Jl2'(0)      '  (3.63c) 


b  =   3    [fChjW)    -   ^(]^(^))1  +  ^0  "^  ^1      '  (3.63d) 


ru2  t1/2 

a  =   [b     -  Sq  s^J  .  (3.63e) 


Then   let 


p     =  max   {O.l,    p     H-min   {p    ,    1-p    }/2}      .  (3.64) 


k  — 
Then  after  evaluating   f(h„(p   )),    if 


f(h2(P^))    <  f(x^      ,  (3.65) 


set  p  =  p  ,  and  the  scalar  search  is  done.   Observe  that  (3.64)  is 
an  attempt  at  satisfying  (3.60a).   Finally,  if  (3.65)  is  not  satis- 
fied, the  procedure  becomes  iterative.   The  minimum  of  the  quadratic 
in  p  through  the  function  and  derivative  at  p  =  0,  and  through  the 
function  at  p  =  p  is  computed  as  follows 


Pc  =  •'    Pc  ^0  /  ^^c   ^0  ^  ^()i2(°)>  -  f(Jj2(Pc)>^   '      (3.66) 


where  s^  is  given  in  (3.63c).   Define 


p^  =  max  {p^,  p^/4}   .  (3.67) 
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k  — 
Then  after  evaluating  f(h„(p  )),  if  (3.65)  is  satisfied,  the  search 
o-z   c 

is  done  with  p,  =  p  ,  otherwise  the  process  is  repeated  beginning 
with  (3.66).   Figure  3.1  summarizes  the  steps  in  a  flow  chart.   For 
the  functions  tested,  the  second-order  transformation  was  rarely 
selected.   Most  of  the  time  when  it  was  selected,  the  search  ended 
with  (3.64);  thus  only  one  additional  function  evaluation  was  needed 
most  of  the  time  the  second-order  transformation  was  selected. 

Third-or  Fourth-Order  Transformation  Selected.  Whenever  the 
third-or  fourth-order  transformation  is  selected,  the  search  for  a 

p  to  approximate  a  solution  of  (3.60)  is  no  longer  along  straight 

1^ 
lines  in  the  space  of  the  independent  variables.   Note  that  h_(p)  and 

[^ 
h,(p)  given  in  (3.53b)  and  (3.53c)  are  polynomials  in  p  of  degree 

greater  than  one.  For  clarity  of  notation,  the  superscript  and  sub- 
script of  the  transformation  function  will  be  dropped;  i.e.,  for  the 
present  discussion 


h(p)  =  h''(p),    r  =  3,  or  4   .  (3.68) 

'\j  Ojr 


Additionally,  the  individual  components  of  the  transformation  vector 
function  will  be  needed.   Thus  let  h(p)  be  defined  by 


h(p)  =  (h,(p),  h  (p) h  (p))^  .  (3.69) 

1/       i      z  n 


k+1  th 

Therefore,  since  x    =  h(p,  ),  the  i   component  of  all  the  possible 

k+1 
points  that  may  become  x    is  given  by 
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Set 


Set 


Scalar  Search  when: 

k  * 

1)  X  is  far  from  x  ,  and 


2)      Second-order   transformation 
is   selected. 


Yes 


Yes 


Compute  Pj. 
from  (3.63) 


Compute  pc 
from  (3.64) 


Evaluate   f (h-Cp  )) 

•Xil       c 


Compute  Pc 
from    (3.67) 


No 


Compute  Pj, 
from    (3.66) 


k+1       .k,      , 
<  =  h„(p.) 


i  DONE   ) 


Figure  3.1     Flowchart  of  a  portion  of  the   Scalar   Search  step   of   the 
VO  algorithm. 
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x^  =  h.(p)      .  (3.70) 


If  k  11 

This   time,   maximizing       h(p)    -  x        ,    as   defined   in   (3.60),   may  not  be 

simply  achieved  by  increasing  p,    as   it   is   if   the   second-order  trans- 
formation is  selected.      In  particular,    since  each  coordinate  is  given 
by   (3.70)    there  may  be  certain  values   of  p   for  which  the  square  of 
the  difference 


(x.   -  \)^  -   (h.(p)    -  xj)^  (3.71) 


is  a  maximum.   This  would  certainly  tend  to  satisfy  the  principle 

k+1  k 

that  X    should  be  as  far  as  possible  from  x  .   A  necessary  condition 
a.  Oi 

to  maximize  (3.71),  which  would  tend  to  satisfy  (3.60a),  is  given  by 
differentiating  (3.71)  with  respect  to  p  and  setting  it  equal  to  zero, 
to  obtain  the  equation 


h|(p)  =  0   .  (3.72) 


This  equation  is  a  linear  equation  in  p  for  the  third-order  trans- 
formation, and  a  quadratic  polynomial  in  p  for  the  fourth-order 
transformation.   Therefore,  its  zeros  may  be  easily  found.   If  any 
of  these  zeros  are  positive,  it  implies  that  the  i   coordinate  moves 
away  from  x  and  at  the  value  of  p  equal  to  a  positive  zero  of  (3.72) 
it  begins  to  move  closer  to  x  again.   Therefore,  positive  zeros  are 
suitable  candidates  to  satisfy  (3.60a).   It  is  proposed  that  these 
zeros  be  computed  for  all  coordinates  using  (3.72),  and  to  discard 
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any  which  are  not  positive.   These  zeros  will  be  considered  trial 
values  of  p  later. 

Additional  trial  values  of  p  are  obtained  by  the  following 
argument.  After  expansion  of  f(x)  in  a  Taylor  series  about  x  and 
substitution  of  x  =  h(p),  the  following  expression  is  obtained 

f(h(p))  =  f(xh   +   f'(xS'^[h(p)  -  x^]  +  ...   .  (3.73) 


Since  it  is  desired  to  compute  a  p  which  yields  f(h(p))  sufficiently 
less  than  f(x  ),  the  term 


f'(xS^[h(p)  -  x^]  (3.74) 

should  be  as  negative  as  possible.   Therefore,  values  of  p  for  which 
(3.74)  may  achieve  a  minimum  value  are  points  that  are  easily  com- 
puted.  The  necessary  condition  yields 

f'(x'')'^  h'(p)  =  0   ,  (3.75) 

which  is  a  polynomial  in  p  with  zeros  that  may  yield  additional  trial 
values  of  p,  if  any  of  the  zeros  are  positive. 

Before  describing  how  these  trial  values  of  p  are  used  in  approxi- 
mating a  solution  of  (3.60),  the  c^   appearing  in  (3.60b)  needs  to  be 

defined.   Recall  that  f(h(I))  Is  already  evaluated  and  it  is  less 

a. 

than  f(x  ).   The  constant  c  is  defined  such  that  a  value  of  p  could 
be  used  provided  it  does  not  yield  a  function  value  too  much  greater 
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than  f(h(l)).   This  is  accomplished  by  defining  c,  by 


r 


-min  {10f(h(l))  -  f(xS,  .1  w}  .   f(h(l))  >  0,. 

(3.76) 
-min  {.lf(h(l))  -  l(J^),    .1  w}  ,   f(h(l))  <  0, 


where 


w  =  f(h(l))  -  f(x^   . 


The  constraint  (3.60b)  is  now  defined  and  the  zeros  previously 
found  are  candidates  to  satisfy  it.   It  was  found  experimentally  that 
a  value  of  p  greater  than  six  never  satisfied  (3.60b).   In  addition, 
since  f(h(l))  is  less  than  f(x  ),  only  values  of  p  in  the  range 
1  <  p  <  6  are  considered  (note  that  non-integer  values  are  used). 
All  the  zeros  previously  obtained  from  (3.72)  and  (3.75)  within  the 
above  range  are  sorted.   Then  beginning  from  the  largest  value  and 
on  to  the  smallest  one,  the  function  is  evaluated  and  as  soon  as 
(3.60b)  is  satisfied,  the  scalar  search  is  complete.   In  case  (3.72) 
and  (3.75)  yield  no  trial  values  of  p  in  the  range  1  <  p  <  6,  the 
function  is  evaluated  at  f(h(p)),  for  p  =  2,  3,  . . . ,  p,  ,  p, +1 ,  until 
f(h(Pt))  satisfies  (3.60b),  and  f(h(p, +1))  does  not.   For  all  the 
functions  tested,  in  most  iterations  (3.72)  and  (3.75)  yielded  trial 
values  of  p.   Furthermore,  in  most  iterations  only  one  additional 
function  evaluation  was  needed  to  end  the  search. 

3.5  Hessian  Singular  or  Negative  Definite 
In  computing  the  second-,  third-  and  fourth-order  corrections 
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given  by  (3.39),  (3.44),  and  (3.48)  there  are  two  major  difficulties 
to  be  considered  concerning  the  hessian  matrix:   it  may  be  singular 
and  it  may  be  negative  definite.   If  the  hessian  is  singular,  the 
proposed  corrections  cannot  be  computed.   If  the  hessian  is  negative 
definite,  the  current  point  x  is  not  in  some  small  neighborhood  of 
a  strict  local  minimum.   Furthermore,  if  the  hessian  is  not  positive 
definite,  the  proposed  transformations  may  not  give  descent  tra- 
jectories.  Recall  that  if 


f'(x'')^  hi^'(O)  <  0   ,  (3.77) 


then  the  existence  of  a  p  >  0  which  satisfies 


f(hNp))  <  f(xS  (3.78) 

Oir        "h 


is  implied  as  shown  in  Lemma  3.1.   Observe  that 


hJ^'(O)  =  -a^  d^,    r  =  2,  3,  or  4   .  (3.79) 


where   from   (3.53),    a.  =   1,    a     =   3/2,    and  a^  =    11/6.      Therefore,    the 
descent   condition    (3.77)    becomes 


-a     f'(xb^  f"(x'^)    ^   f'(x^)    <   0      ,  (3.80) 


1^ 
where  (3.39),  which  defines  d„,  was  used.   The  inequality  (3.80)  may 

not  necessarily  be  satisfied  whenever  f"(x  )  is  not  positive  definite. 
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Moreover,  even  if  (3.80),  and  thus  (3.78),  is  satisfied  when  the 
hessian  is  not  positive  definite,  the  descent  trajectory  may  still 
be  undesirable.   Recall  that  the  transformation  functions  were  derived 
to  compute  a  point  which  would  yield  a  zero  gradient.   While  the 
gradient  is  zero  at  a  local  minimum,  it  is  also  zero  at  a  local  maxi- 
mum and  at  a  saddle  point  [38].   Therefore,  a  descent  trajectory  may 
be  towards  a  saddle  point.   Saddle  points  are  even  more  difficult 
since  the  transformation  functions  may  yield  trajectories  towards  a 
saddle  point  even  when  the  hessian  is  positive  definite.   This  dif- 
ficulty will  be  discussed  again  when  the  global  convergence  of  the 
algorithm  is  established  in  the  next  section.   Thus  singularity  and 
non-positive  definiteness  of  the  hessian  are  signals  to  be  used  in 
avoiding  these  troublesome  points. 

Since  the  hessian  inverse  is  used  in  Newton's  minimization 
algorithm  [34],  several  alternatives  have  been  proposed  whenever  the 
hessian  is  not  positive  definite  [30,34].   The  method  we  propose  is 
a  modification  of  the  one  given  by  Murray  [36]  for  a  Newton-like 
minimization  algorithm.   The  principle  of  Murray's  method  may  be 
described  as  the  computation  of  the  Newton  or  second-order  correction, 
d  ,  by  solving  the  linear  system  of  equations 


F''d^=f'(x^   ,  (3.81a) 


with 


F*^  =  f"(x'^)  +  d'^  ,  (3.81b) 


k  k 

where  D  is  a  diagonal  matrix  which  is  computed  to  insure  that  F  is 
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positive  definite.   If  the  hessian  f"(x  )  is  already  positive  de- 

k.  k 

finite,  the  matrix  D  is  the  zero  matrix,  and  d„  is  the  second-order 

correction  as  defined  earlier.   Observe  that  in  the  VO  algorithm, 
the  approximation  to  the  third- and  the  fourth-order  corrections, 
(3.44)  and  (3.48),  may  be  also  defined  as  solutions  of  linear  systems 
of  equations  with  coefficient  matrices  equal  to  F  . 

Murray's  procedure  for  computing  D  is  based  on  the  Cholesky 
factorization  of  a  positive  definite  matrix.   The  Cholesky  factor- 
ization may  be  described  as  the  computation  of  the  upper  triangular 
matrix  U,  such  that 


f'^  =  u'^  U   .  (3.82) 


A  by-product  of  this  factorization  is  the  diagonal  matrix  D  .   The 
modification  we  propose  adds  a  pivoting  strategy  to  this  factorization 
procedure.   The  result  of  this  modification  is  that  the  diagonal 
matrix  D  will  tend  to  have  a  fewer  number  of  nonzero  diagonals  than 
the  original  procedure.   Once  the  factorization  is  computed,  the 
computation  of  all  the  high-order  corrections  is  simply  obtained,  as 
shown  later.   The  details  of  Murray's  procedure  are  given  next,  fol- 
lowed by  the  details  of  the  proposed  modification. 


3.5.1  Murray's  Procedure 


by 


Equating  matrix  elements  in  (3.82)  yields  the  i   row  of  U  given 


,      i-1   ,   1/2 
^i  =  {  ^f  ^li  -  \    %i  }      '  (^-"^^ 
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k      ^'^ 
"-ii  =  i    ^l   ^.M  -  I     "n,^  "  -i  }  /  ",-■;  '   J  =  i+1.  •-..  n.  (3.83b) 


Ij    ^   -v  ij   ^^^  mi  mj  '  ii 


It  can  be  shown  [53-54]  that  if  F  is  positive  definite,  all  diagonal 
elements  given  by  (3.83a)  are  greater  than  zero,  and  that  all  the 
elements  of  U  are  bounded  by 


0  <  |u   I  <  max  {[f'']!{2  ,   i  =  1,  .  .  .  ,  n}    .  (3.84) 


1, 

The  procedure  due  to  Murray  is  to  in  effect  obtain  D  in  (3.81)  such 
that  all  diagonal  elements  in  (3.83a)  are  bounded  by 

■5  1  "ii  1  n  B   ,  (3.85) 

where  B  may  be  defined  by 


e  =  max  {|[C(x^1   1^/2  ^    i,  -j  =  1,  ...,  n}    ,     ( 


3.86) 


and  6  >  0  is  a  given  constant  which  is  used  below  to  in  effect  iden- 
tify the  square  root  of  a  numerical  zero  due  to  round-off  errors 

-s/2 
(6  =  10     gave  good  results,  where  s  Is  the  number  of  significant 

digits  of  the  numbers  in  the  computer) .   The  off-diagonal  elements 

are  also  bounded  by 


i  =  1,  ...,  n-1;   j  >  i   .  (3.87) 
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The  i   stage  of  the  procedure  may  be  described  as  follows.   Define 
the  quantity 


u.  =  max 


{  5, 


-"  /,,'^^ 


i-1   2   1/2 


'h   \,       11    ''.mi     '   ' 
m=l 


(3.88) 


which  will  be  considered  as  the  candidate  for  the  diagonal  u...  and 

11 


rn,.k 


i-1 


u,  =  [fCx*")]..  -  I     u  .  u  .  ,   j  =  i+l,  ....  n   ,      (3.89) 


J      '\.   'V,    ij 


m=l 


mi  mj 


Observe  that  u.  .  =  u./u.,,  i=i+l,  ....  n,  will  be  the  off-diagonal 
ij    J   li  -'  " 

elements  of  the  i   row  once  u..  has  been  computed.   If  the  relation 

11 

given  by 


(l/u.)  max  {|u.|,   j  =  i+1 ,  ...,  n)  <^ 


(3.90) 


is  satisfied,  then  set 


u,  .  =  u 


ii   "i   ' 


(3.91) 


otherwise  set 


u^^  =  (1/6)  max  {|u.|  ,   j  =  i+1,  ...,  n] 


(3.92) 


Then  the  rest  of  the  i   row  is  given  by 


ij 


u.  /  u^^  ,    j  =  i+1,  . . .  ,  n   . 


(3.93) 
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th  k 

The  i   diagonal  of  the  matrix  D  is  given  by 


d..  =  uj.  -  [f(x^].,  +     j     nl.      .  (3.94) 

11    li    rxj     -x,       i.x  ^,   ml 

m-l 


It  can  be  shown  [36],  In  a  straightforward  manner,  that  the  bounds 
(3.85)  and  (3.87)  apply  to  this  procedure. 

3.5.2  Proposed  Modification  to  Murray's  Procedure 

The  modification  to  Murray's  procedure  is  motivated  by  a  desire 
that  the  number  of  nonzero  elements  in  D  be  as  few  as  possible  to 
in  effect  use  as  much  of  the  hessian  as  possible.   If  some  form  of 
diagonal  pivoting  is  added  to  Murray's  procedure,  not  only  will  the 
number  of  nonzero  diagonals  of  D  tend  to  be  small,  but  numerical 
stability  may  also  be  gained.   Therefore,  it  is  proposed  that  at  each 
stage  of  the  factorization  procedure,  the  strongest  diagonal  is 
selected,  where  the  strongest  diagonal  is  defined  as  the  diagonal 
which  generates  the  smallest,  In  absolute  value,  maximum  off-diagonal 
element  in  Its  row.   Efficient  implementation  of  this  modification  is 
described  next. 

First,  recognize  that  the  Cholesky  factorization  of  F  with 
diagonal  pivoting  may  be  described  as  the  computation  of  the  upper 
triangular  matrix  U  such  that 


P  F^  p'^  =  U^  y   ,  (3.95) 


where  P  is  a  permutation  matrix  with  columns  equal  to  a  permutation 
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of  the  columns  of  the  unit  diagonal  matrix.   The  procedure  begins  by 
initializing  the  elements  of  U  as  follows 


■^°^  =  [f"(x^]..  ,   i  =  l. 


ij 


•Xj  ^  IJ 


n;  J 


(3.96) 


The  elements  of  U  are  then  modified  iteratively.   To  describe  the  i 
stage  of  the  procedure  we  begin  by  defining  the  set 


th 


{e,  }  =  {max 


,  j=i k-1; 


,  j=k+l,  ...  ,  n}. 


k=i,  ...,  n}   , 


(3.97) 


which  contains  the  maximum  absolute  value  of  the  off-diagonal  elements 
in  each  row  not  yet  processed.   The  next  diagonal  to  be  selected  for 
pivoting  is  based  on  the  following  sequential  tests: 


1)  If  i  =  n  the  set  {e,  }  is  empty.   Select  the  remaining  diagonal, 

2)  Otherwise,  if  any  element  of  {e,  }  is  zero,  select  the  diag- 
onal corresponding  to  the  first  such  zero  element.   This  is 


a  row  where  all  off-diagonal  elements  are  zero. 


3)   Otherwise,  if  the  set  {e,  / 


/i-1) 

kk 


/i-1) 
^kk 


5^  0,  k  =  i,  ...,  n} 


is  not  empty,  select  the  diagonal  corresponding  to  its 
smallest  element  (the  first  one  if  ties  exist). 
4)   Otherwise,  select  the  diagonal  corresponding  to  the  smallest 
element  of  the  set  {e,  }  (the  first  one  if  ties  exist).   This 
choice  occurs  when  all  remaining  diagonal  elements  are  zero. 


The  appropriate  interchange  of  rows  and  columns  is  done  next  in  order 
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,th 


to  bring  the  selected  diagonal  into  the  i   row.   This  interchange 
is  noted  in  the  permutation  matrix  P  of  (3.95).   Now  define 


u ,  =  max  {  6 , 


XX 


1/2 


(3.98) 


If 


(1/u.)  max  { 


^(i-1) 


.  J  =  i+1 n} 


(3.99) 


set 


.(i) 


(3.100) 


otherwise  set 


u[]>    =  (1/6)  max  {  u^^"'^  ,  j  =  i+1 ,  ....  n} 


4r" 


(3.101) 


The  i   diagonal  of  the  permuted  D  matrix  is  given  by 


^u=t4i'^^-"r' 


(3.102) 


The  rest  of   the  row  becomes 


,  (i)    _      (i-1)     .      (i)  .        _, 


(3.103) 


The  rest  of  the  matrix  is  updated  as  follows 
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"L"  =  i""  -  ^IMf  .  — -^-. 


(3.104) 


This  completes  the  i   stage  of  the  procedure.   The  pivoting  strategy 
proposed  insures  that  the  remaining  matrix  is  changed  by  a  small 
amount,  since  the  maximum  absolute  value  of  the  change  to  the  re- 
maining matrix  in  (3.104)  was  minimized.   Observe  that  double  pre- 
cision is  recommended  to  store  the  matrix  U  in  the  modified  procedure 
since  inner  products  can  no  longer  be  efficiently  accumulated  [53-54] 
as  it  is  possible  in  the  original  method. 

3.5.3   Illustrative  Example 


The  following  example  illustrates  the  effect  of  pivoting.   Let 
f"(x  )  for  a  three-dimensional  problem  be  given  by 


f"(x^)  = 

1/   Oi 


0    1    -10 

14      0 

-10    0    400 


Let  6  =  10   ,  and  for  this  matrix  g  =  20  from  (3.86).   Without 

k 
pivoting,  the  method  proposed  by  Murray  yields  D  given  by 


D  =  diag  (.25,  4,  800) 


and 
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U  = 


.5  2  -20 
0  2  20 
0    0     20 


The  proposed  modification,  with  a  diagonal  pivot  order  3,  2,  1,  yields 


p  =  diag  (1,  0,  0)   , 


and 


U  = 


20  0  -.5 
0  2.5 
0    0    (.5) 


1/2 


3.5.4  Computation  of  High-Order  Corrections 

After  the  factorization  (3.95)  is  coinpleted,  the  high-order 
corrections  are  computed  as  follows.   Instead  of  (3.39),  we  may  now 
write 


[f"(x^  +  D^]  d^  =  f'(xS 


(3.105) 


Define  v  by 


d»  =  P   V   , 


(3.106) 


where  P     is   the    transpose  of   the  permutation  used  in  the   factoriza- 
tion procedure.      Now  multiply   (3.105)   by  P,   and  use   (3.106)    to  obtain 
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P  [f"{x^)   +  D*^]  p"^  V  =  P  f'Cx'^)   .  (3.107) 


'V'Vi'X)  'Xj      ^  '\j  'h     %  IXj 


From  (3.81)  and  (3.95),  the  factorization  process  transforms  (3.107) 
into 


U*^  U  V  =  P  f'(x^   ,  (3.108) 


which  may  be  solved  by  first  solving 


U^'  w  =  P  f  •(x'')  (3.109) 


for  w  by  forward  substitution,  and  then  solving 


y  y  =  w  (3.110) 


for  V  by  back  substitution.   The  second-order  correction  is  then 

'Xi 

obtained  from  (3.106).   Similarly,  the  approximation  to  the  third- 
order  correction  given  in  (3.A^),   now  becomes  the  solution  of 


[f"(x^)  +  D*"]  d^  =  f(h^(l))   ,  (3.111) 


and  the  fourth-order  correction  given  by  (3.48)  now  becomes 


[f(x^  +  D^]  dj"  =  f'(h^(l))   .  (3.112) 

a,    '\<        -Xj  'XjI\  'Xj        'V/J 


These   two   systems   of  equations   have   the  same  coefficient  matrix  as 
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(3.105),  and  therefore  the  same  factorization  applies.   The  solutions 
of  (3.111)  and  (3.112)  are  obtained  similarly  to  the  procedure  out- 
lined for  solving  (3.105). 

3.6  Convergence  of  the  VO  Algorithm 

Two  convergence  properties  of  the  VO  algorithm  are  given  in  this 
section.   First  we  establish  the  class  of  functions  for  which  the 
algorithm  is  globally  convergent.   Second,  it  will  be  shown  that  the 
VO  algorithm  generates  a  sequence  with  a  high  order  of  convergence 
for  most  functions.   When  an  algorithm  generates  sequences  with  a 
high  order  of  convergence,  an  approximation  to  a  solution  of  the  mini- 
mization problem  (3.2)  can  be  computed  in  a  small  number  of  itera- 
tions, if  the  initial  point  is  close  to  the  solution. 

3.6.1  Global  Convergence 

The  global  convergence  of  the  VO  algorithm  will  be  established 
by  using  the  general  analysis  of  algorithms  developed  mainly  by 
Zangwill  [56].   A  brief  review  of  this  analysis  is  given  below. 
The  new  algorithm  is  then  recast   in  a  manner  which  allows  the 
results  of  this  analysis  to  be  used. 

A  minimization  algorithm  may  be  generally  described  by  a  point- 
to-set  mapping.   A  point-to-set  mapping  assigns  to  every  point 
X  e  e"  a  subset  of  e".   Let  A  be  a  point-to-set  mapping,  then  A(x) 
may  be  represented  by 

A(x)  =  {^  £   e"}   ,  (3.113) 
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where  the  definition  of  the  elements  y  constitutes  part  of  the  algo- 
rithm.   The  sequence  of  points  {x  }  generated  by  the  algorithm  is 
given  by 


k+1   .,  k. 
{    e  A(x  )   , 


beginning   from  some   initial  point  x   .      The  point   selected   from  the 
set  A(x   )    at   each   iteration   is   also  part   of   the  details  of   the  algo- 
rithm.      It   is  clear   that   the   sequence   {x    }   cannot   be  predicted  solely 
from  knowledge  of   the  initial   point  x    .      As   the   scalar  search   for   the 
VO  algorithm  demonstrates,   similar  algorithms   using   the  same   trans- 
formation  functions   could  implement   the  scalar  search  somewhat 
differently.      This   difference  may  be  enough   to  generate  different 
sequences,   but  as    the  Global   Convergence  Theorem  will  show,    the 
different   sequences  may  still   converge.      Thus   the  point-to-set 
mapping   concept  aids   in  analyzing  classes   of  algorithms  without 
describing   its   steps   in  detail. 

An   important   property  of   point-to-set  mappings,   which   is   re- 
quired later  on,    is    that   they  may  be  closed.      A  point-to-set  mapping 
A  is   said   to  be  closed  at  x,    if   the  assumptions 


1)      x^  -»-  X      , 


2)     y    ->■  y     ,        ye  A(x  )     , 


imply 
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3)  y  E  A(x)   . 

The  closedness  property  is  a  generalization  of  continuity  of  point- 
to-point  mappings  or  ordinary  functions. 

The  main  result  due  to  Zangwill  may  now  be  given. 

Global  Convergence  Theorem.  Let  the  point-to-set  map  A(x)  be 
an  algorithm  on  E  ,  and  suppose  that  given  x  the  sequence  {x  }  is 
generated  satisfying 


k+1    .  ,  Ic, 
X    e  A(x  ) 


Let  fi  be  a  subset  of  E  defined  as  the  set  of  solution  points  of  the 
minimization  problem  (3.2),  and  suppose 

1)  All  points  x  are  in  a  compact  set. 

2)  The  function  f(x)  is  continuous  and 

a)  if  X  li!  fi,  then  f(;^)  <  f(x)  for  all  y  e  A(x), 

b)  if  X  e  Q,    then  either  the  algorithm  terminates,  or 
for  all  y  e  A(x) ,  f(y)  <  f(x). 

%        Oi        Oi   —    '\j 

3)  The  map  A  is  closed  at  points  outside  fi. 

Then  either  the  algorithm  stops  at  a  solution  point  in  fi,  or  the 
limit  of  any  convergent  subsequence  of  {x  }  is  a  solution  point  in  Q. 

The  proof  may  be  found  in  [56,34].   Condition  1  of  the  theorem  in- 
sures the  existence  of  a  convergent  subsequence.   Its  violation 
normally  indicates  that  the  minimization  problem  has  no  finite 
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solution,  and  thus  this  condition  is  not  very  restrictive.   Condition 
2  is  normally  satisfied  by  a  suitable  transformation  function  and  a 
scalar  search  using  the  terminology  of  the  VO  algorithm.   Condition  3 
of  the  algorithm  is  usually  the  most  challenging.   For  the  new  algo- 
rithm, the  satisfaction  of  this  condition  imposes  continuity  require- 
ments on  the  function  and  its  first  two  derivatives,  as  well  as  the 
additional  condition  of  pseudoconvexity. 

The  VO  algorithm  will  be  described  as  a  point-to-set  composition 
mapping  given  by 


A(x)  =  S^  (M^(x))   , 


where  M  is  a  point-to-point  map,  and  S  is  a  point-to-set  map.   The 
following  lemma  [56]  establishes  the  conditions  on  each  mapping  to 
yield  a  closed  composition. 

Lemma  3.2.   Let  M:E  ->-  E  be  a  point-to-point  map  and  S:E  ->- {y  e  E  } 
be  a  point-to-set  map.   Assume  M  is  continuous  at  x  and  S  is  closed  at 
M(x).   Then  the  point-to-set  map  A(x)  =  S  (M(x) )  is  closed  at  x. 


The  point-to-point  mapping  M  :E  ->•  E       which  characterizes 
the  transformation  phase  of  the  VO  algorithm  may  be  described  by 


M  (x)  =  (x,  d(r))  ,    r  =  2,  3,  or  4   ,  (3.114a) 

r  a<     iXi     'Xj 


where  d(r)  denotes  sets  of  correction  terms  given  by 
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^^2> 


,  r  =  2   , 


d(r)  =   i     (d^.  d3)     ,   r  =  3   . 


(^2'^3'  W'  '  =   '      ' 


(3.114b) 


and  the  corrections  d„,  d„,  and  d,  are  the  solutions  of 
'v2  '\i3  'h't 


[f"(x)  +  D]  d„  =  f'(x)   , 


(3.114c) 


[f"(x)  +  D]  d-  =  f'Cx  -  dj   , 


(3.114d) 


and 


[f"(x)  +  D]  d.  =  f(x  -  d„  -  d„) 


(3.114e) 


(Note  that  the  diagonal  matrix  D  has  been  included  as  discussed  in 
Section  3.5.4.)   In  order  to  make  use  of  Lemma  3.2  we  need  to  estab- 
lish that  M  is  continuous, 
r 

Lemma  3. 3 .   If  the  gradient  and  the  hessian  of  f(x)  are  continu- 
ous, the  mapping  M  (x)  given  in  (3.114)  is  continuous, 
r  ^ 

The  proof  is  immediate  since  the  diagonal  matrix  D  is  computed  to 
insure  [f"(x)  +  D ]  is  non-singular,  and  the  diagonals  of  D  are  con- 
tinuous  functions  of  the  elements  of  the  hessian  f"(x)  as  can  be 
readily  seen  in  (3.98-3.102). 
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Now  consider  the  point-to-set  mapping  S  :E       ->  {y  e  E  } 
which  operates  on  M  (x)  which  characterizes  the  scalar  search  phase 
of  the  VO  algorithm,  and  may  be  conveniently  expressed  as  follows 


S^(x,  d(r))  =  {v:  y  =  h^(p)  for  p  >  0  and  f(h^(p))  <  f  (x)  } 

r  Oi  "Vi       iV/  Oi  ^^r  a.r        ^i 


r  =  2,  3,  or  4, 


(3.115a) 


where 


h  (p)  =  i 

o-r 


X  -  dp 


X  -  (3/2)d-P  -  [d_  -  (l/2)d-]p^ 


,  r  =  2, 
,  r=3, 


X-  (ll/6)d_p-  (2d  -d„)p  -  [d,-d-+(l/6)d_]p-',  r  =  4. 

(3.115b) 


Observe  that  (3.115b)  consists  of  the  three  transformation  functions 

derived  in  Section  3.3.   The  following  lemma  establishes  the  conditions 

that  are  sufficient  for  S  to  be  closed. 

r 


Lemma  3.  A.   If  f'(x)  5^  0  and  f(x)  is  continuous,  the  mapping  S 


given  in  (3.115)  is  closed. 


Proof:   Recall  that  in  order  to  show  that  S  is  closed,  the 
r 

conditions 


1)   (x^,  d''(r))  -^  (x,  d(r))  , 


2)   y^  -.  y,   y^  e  S  (x^,  d^(r))  , 
O/    '\j         'Vi     r  'V/   '\j 
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imply 

3)   y  e  S  (x,  d(r))   . 

Suppose  first  that  r  =  2.   Then 


k    k     ^k 
y  =  X  -  p  d   .  (3.116 


The  assumption  f  (x)  ?5  0  implies  that  d.  ?^  0  for  all  k  from  (3.114c) 
Thus  one  may  write  from  (3.116) 


\'h'-i«'  II «2 


which  when  taking  limits  yields 


P  =  Ik  -  ^  II  /  Ik 


^2 


This  establishes  the  existence  of  a  limit  for  the  sequence  {p,  }.   It 
then  follows  that  y  =  x  -  p  d-.   It  remains  to  be  shown  that 

Oj     a.        'XjZ 

y  e  S  (x,  d„) .   For  each  k,  p,  satisfies 
'v^    r  oj  '\;2  k 


''^l^   =  f^^''  -  ^2  Pk>  <  '^^""^      •  (3.117) 


That  such  a  p,  exists  follows  from  the  fact  that 


f'(xS^  h:(0)  =  -f'(x^)^  d  <  0   ,  (3.118) 
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and  from  Lemma  3.1.   Taking  limits  in  (3.117),  and  using  the  assump- 
tion that  f(x)  is  continuous  yields 

f(^)  <  f(x)   ,  (3.119) 

and  hence  y  e  S-(x,  d„).   Now  consider  the  generalization  of  the  pre- 
ceding  proof  to  any  r.   For  each  k,  one  may  write 


I     -  \r^^^?   -  I     ^  lr^''\^   Pk'    '^   (°'  ^)   '  ^^-12°) 


using  the  Mean  Value  Theorem  [45],   The  assumption  f'(x)  i=   0  implies 

•\,     '\j         % 

that 


f(vS   =   f(h^(p.))    <   f(x'')      .  (3.121) 

Tj  oir     K  'x, 


since 


f  (xS^  h'(0)    <  0      , 


Oi         "V;  '\,X 


and  Lemma  3.1  implies  the  existence  of  a  p  which  satisfies  (3.121), 
From  the  Mean  Value  Theorem  and  (3.120) 


^(^')  =  f(^''n;<«^VPk)  =  ^(^''>n'(^''-^£^;(*^Pk)v^fe;^'=Pk)Pk » 


for  some  t  e  (0,  1).   The  above  expression  with  (3.121)  imply 


)^.^'\^    ^   0 
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in  (3.120).   Thus  taking  limits  in  (3.120)  establishes  the  existence 
of  a  p  given  by 


p  =  II  y  -  X  II  /  II  h- (tp) 


Thus   for  each  k, 


'^i^^  =  'k^  "■  fe;(«^v  Pk^  <  '^^^  ' 


and  after  taking  limits,  and  using  the  continuity  of  f(x),  we  obtain 
f(y)  <  f(x)   . 

Hence  V  e  S  (x,  d(r)).   This  completes  the  proof. 

The  VO  algorithm  may  now  be  given  as  the  composition  of  the  two 
mappings  M  (x)  given  in  (3.113),  and  S  (x,  d(r))  given  in  (3.115)  to 
be 


A(x)  =  S   (M^(x))   .  (3.122) 


By  Lemmas  3.2,  3.3,  and  3. A  the  VO  algorithm  is  closed  at  x,  if  both 
f'(x)  /  0  and  if  f(x)  has  continuous  first  and  second  derivatives. 
This  implies  that,  using  the  Global  Convergence  Theorem,  if 

1)  All  x    E  A(x  )  are  in  a  compact  set, 

2)  The  gradient  f'(x)  ?*  0,  except  at  a  solution  of  the 
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minimization  problem  (3.2). 
3)   The  function  f(x)  is  continuously  twice  dif ferentiable, 

then  the  VO  algorithm  generates  a  sequence  which  converges  to  a 
solution  or  it  stops  at  a  solution.   The  conditions  1  and  3  may  not 
be  difficult  to  achieve  in  practice.   However  condition  2  implies 
that  the  algorithm  is  not  closed  at  a  local  maximum  or  at  a  saddle 
point  of  f(x),  since  f'(x)  is  zero  at  these  points.   Thus  theoreti- 
cally, f(x)  must  not  have  any  such  points;  a  function  not  having 
these  points  is  defined  to  be  pseudoconvex  [35].   In  practice,  the 
algorithm  should  generate  convergent  sequences  if  the  function  is 
pseudoconvex  in  the  region  including  the  desired  solution  and  the 
initial  point.   However,  experimental  evidence  on  one  tested  problem 
indicates  that  the  algorithm  is  superior   to  others  in  avoiding 
these  troublesome  points  even  when  they  exist  in  the  region  of  in- 
terest. 

The  VO  algorithm  may  be  trivially  modified  to  prevent  convergence 
to  a  strict  local  maximum.   If  x  is  a  local  maximum,  the  present 

algorithm  will  fail  in  the  sense  that  f'(x  )  =0  will  cause  the 

%  1/     '\/ 

algorithm  to  stop.   At  this  point  however  f"(x  )  will  be  negative 
semidefinite  [56],  which  is  indicated  in  the  VO  algorithm  by  D  ^  0 
in  the  modified  Cholesky  factorization  presented  in  Section  3.5. 
However  D  5^  0  may  also  occur  when  the  hessian  is  positive  semide- 

'\.      'V. 

finite,  or  indefinite.   Thus  D  ii^  0  is  an  indication  of  potential 

If 
problems.   Thus  if  D  ?^  0  at  the  point  at  which  the  algorithm  stops, 

coordinate  searches  may  be  undertaken  to  ascertain  that  the  expres- 
sions 
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£(x^   +  te^)  <  f(x^),   1  =  1,  ....  n 


are  satisfied  for  some  small  value  of  t.   If  x  is  a  strict  local 

•Xj 

maximum,  the  above  procedure  should  indicate  so,  and  x  may  then  be 
perturbed  to  start  the  algorithm  again. 

The  previous  modification  may  not  work  if  x  is  a  saddle  point 
as  the  example  in  Section  3.1  shows.   It  must  be  noted  however  that 
numerical  experiments  will  be  given  in  the  next  chapter  which  show 
that  the  VO  algorithm  appears  to  be  highly  effective  in  avoiding  con- 
vergence to  saddle  points.   For  one  tested  problem  with  a  saddle 
point  three  existing  algorithms  converged  to  the  saddle  point  when 
the  Initial  guess  was  close  to  the  saddle  point,  while  the  VO  algo- 
rithm converged  to  the  correct  solution  from  the  same  initial  point. 

3.6.2  Order  of  Convergence 

The  order  of  convergence  of  the  VO  algorithm  can  be  established 
from  published  results  dating  back  to  Schroder  in  1870,  who  was  the 
first  to  define  the  concept  of  order  of  convergence  [47-48].   However 
in  this  section,  the  more  modern  results  due  to  Ortega  and  Rheinboldt 
[38]  and  Traub  [50]  will  be  used.   Two  results  are  given;  one  applies 
to  the  convergence  of  the  algorithm  to  local  minima  with  positive 
definite  hessian,  and  the  other  to  local  minima  with  positive  semi- 
definite  hessian. 

Before  we  can  use  the  existing  results,  it  must  be  noted  that 
the  VO  algorithm  becomes  the  m-step  method 
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k,0    k   k,i.  k,i-l   .„,  k,-l  ^,,  k,i-l.   .   , 

X  '   =  X  ,  X  '   =  X  '    -  f"(x  )  f  (x     ),  1=1,  ...,  m, 

k+1  k,m                             ,„   ,„„. 

X  =  X  '    ,                          (3.123) 


k 
when  X  is  in  some  neighborhood  of  a  local  minimum  with  a  positive 

definite  hessian  (e.g.,  the  scalar  parameter  p  is  one).   This  method 
was  proposed  and  studied  by  Traub  [50].   It  has  also  been  used  by  many 
researchers,  as  it  can  be  thought  of  as  Newton's  method  for  solving  a 
system  of  equations  without  updating  the  coefficient  matrix  at  every 
iteration.   The  following  theorem  establishes  that  the  method  con- 
verges with  order  m+1. 

Theorem  3.1.   Let  f(x)  have  continuous  gradients  and  hessians, 
and 

II r^^)  - i"^v  II 1-  iu-?*ii  .   o< c <«  , 


*  *  *  -1 

for  all  X  in  a  neighborhood  of  x  .   Assuming  f'(x  )  =  0  and  f"(x  ) 
exists,  the  order  of  convergence  of  (3.123)  is  ra+1. 


The  proof  may  be  found  in  Ortega  and  Rheinboldt  [38]  and  Traub 
[50].   Thus  the  VO  algorithm  converges  with  order  r,  where  r  Is  the 

transformation  function  order,  whenever  f"(x  )  is  positive  definite. 

II  *  ,   *  -1 

If  f  (x  )  is  positive  semidef inite,  f  (x  )   does  not  exist, 

and  therefore  the  preceding  theorem  does  not  apply.   In  this  case 

the  convergence  is  in  general  linear  (order  equal  to  one)  as  the 

following  argument  shows.   In  this  case  the  algorithm  may  be  des- 
cribed by  the  Iteration 
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(3.124) 

where  c,  a  constant,  and  the  function  g  depend  on  the  transformation 

It 
function  order  selected,  and  where  the  diagonal  matrix  D  is  not  zero 

k    * 
at  x  =  X  .   The  iteration  function  is  thus  given  by 


G(x)  =  X  -  p[f"(x)  +  bT^    [cf'(x)  +  g(p,  x)] 


The  derivative  of  the  iteration  function  evaluated  at  x  =  x  is  given 
by 

G'(x*)  =  1  -  p[f"(x*)  +  D*]'^  [cf"(x*)  +  g'(p,  X*)]   . 


This  matrix  will  not  be  zero  for  any  value  of  p  in  general,  since 

it 

D  is  not  zero.   Thus  convergence  cannot  be  higher  than  linear  as 
long  as  G'(x  )  ?^  0  [38].   It  should  be  noted  that  while  convergence 

A 

is  in  general  linear  whenever  f"(x  )  is  positive  semidef inite,  local 
minima  having  this  property  may  be  picturesquely  described  as  being 

flat.   This  implies  that  for  all  x  in  a  fairly  large  neighborhood  of 

*   II   I    II 
X  ,   f  (x)   is  very  small.  Thus  for  practical  reasons,  it  is  nor- 

mally  unnecessary  to  compute  the  local  minimum  with  great  accuracy 
for  these  cases.   Two  of  the  problems  selected  to  test  the  VO  algo- 
rithm have  their  local  minimum  with  a  positive  semidef inite  hessian. 
The  new  algorithm  was  more  efficient  in  computing  an  approximation 
to  their  local  minimum  than  several  published  algorithms. 
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3 . 7  Summary 

The  major  contribution  of  this  chapter  is  the  derivation  of  a 
new  algorithm  for  finding  a  local  minimum  of  an  unconstrained  non- 
linear function.   The  new  algorithm,  called  the  Variable-Order  (VO) 
algorithm,  has  two  properties  that  no  existing  minimization  algorithm 
has.   The  first  new  property  is  the  order  of  convergence.   While  all 
existing  algorithms  converge  with  order  less  than  or  equal  to  two, 
the  new  algorithm  converges  with  variable  order  as  high  as  four.   The 
second  new  property  is  the  scalar  search  step  of  the  algorithm.   In 
contrast  with  previous  algorithms  that  have  scalar  searches  along  a 
straight  line  in  the  space  of  the  independent  variables,  the  VO  algo- 
rithm may  have  scalar  searches  along  curved  trajectories.   The  VO 
algorithm  was  shown  to  be  globally  convergent  for  pseudoconvex  func- 
tions with  continuous  first  and  second  derivatives.   The  order  of 
convergence  was  also  established  to  be  from  two  to  four  for  functions 
with  a  positive  definite  hesslan  at  the  local  minimum  being  computed. 
If  the  hessian  is  positive  semidefinite  at  the  local  minimum  being 
computed,  the  convergence  was  shown  to  be  linear. 


CHAPTER  4 
IMPLEMENTATION  OF  THE  VARIABLE-ORDER  ALGORITHM 

In  this  chapter  we  consider  the  practical  aspects  of  the  imple- 
mentation of  the  Variable-Order  (VO)  algorithm  to  nonlinear  circuit 
optimization  problems.   These  practical  considerations  lead  to  guide- 
lines and  to  some  modifications  of  the  algorithm  in  order  to  solve  the 
general  nonlinear  programming  problem  given  by 


minimize     f(x)    ,  (4.1a) 


subject  to  a  set  of  nonlinear  inequality  constraints 


q(x)  <  0   ,  (4.1b) 


and  a  set  of  "box"  constraints  given  by 


L        H 
X  <  X  <  X    .  (4.1c) 


Observe  that  any  equality  constraint  may  be  included  in  (4.1b)  as  two 
inequality  constraints  [20]. 

In  the  first  section,  guidelines  are  given  for  handling  the  non- 
linear inequalities  by  the  use  of  penalty  functions.   In  circuit 
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optimization,  the  nonlinear  inequalities  are  "loose",  i.e.,  they  can 
be  relaxed  to  some  degree.  Therefore,  the  penalty  function  approach, 
which  in  effect  relaxes  the  constraints,  is  an  ideal  technique  for 
circuit  optimization  applications. 

In  the  second  section,  it  is  shown  how  the  VO  algorithm  handles 
the  box  constraints.   Unlike  the  nonlinear  inequality  constraints,  the 
box  constraints  must  be  satisfied. 

The  VO  algorithm,  as  described  in  the  last  chapter,  requires  that 
a  subroutine  be  written  which  supplies  the  value  of  the  function,  the 
gradient  and  the  hessian  at  each  point  generated  by  the  algorithm. 
While  writing  such  a  subroutine  may  not  be  difficult  for  some  problems, 
the  VO  algorithm  would  be  more  useful  if  the  hessian  can  be  approxi- 
mated when  it  is  difficult  to  write  a  subroutine  which  supplies  the 
hessian  values.   Sometimes  it  may  be  just  as  difficult  to  supply  even 
the  gradient,  and  thus  in  this  case  both  the  gradient  and  the  hessian 
must  be  approximated.   In  the  third  section  we  consider  approximations 
to  the  hessian  and  the  gradient  when  these  values  are  not  supplied. 

The  fourth  section  considers  the  case  when  the  function  and  the 
gradient  values  supplied  to  the  VO  algorithm  contain  errors.   In  non- 
linear circuit  optimization  applications,  the  subroutine  that  supplies 
the  function  and  the  gradient  values  may  be  actually  a  complex  computer 
program  which  includes  the  solution  of  a  system  of  nonlinear  algebraic 
and  differential  equations.   The  function  and  the  gradient  values 
depend  on  the  solutions  to  these  equations;  thus  due  to  the  numerical 
techniques  used,  errors  may  be  present  in  the  function  and  the  gradient 
values.   Numerical  experiments  will  show  that  if  any  errors  present  in 
the  function  and  the  gradient  values  supplied  to  the  VO  algorithm  can 


-84- 

be  estimated  and  kept  small,  the  algorithm  can  still  be  effectively 
employed. 

The  fifth  section  details  the  steps  in  a  FORTRAN  IV  implementation 
of  the  VO  algorithm.  Finally,  the  sixth  section  presents  several  numer- 
ical experiments  and  comparisons  with  other  algorithms. 

4.1  Nonlinear  Inequality  Constraints 

This  section  gives  guidelines  to  be  used  in  solving  the  problem 
given  by 

minimize      f(x)    ,  (A. 2a) 


subject  to    q(x)  <  0    ,  (4.2b) 


where  q(x)  is  a  vector  function  of  inequality  constraints.   It  is 
assumed  that  all  the  constraints  are  nonlinear. 

The  method  recommended  for  finding  a  solution  of  (4.2)  is  to  con- 
vert the  constrained  problem  to  an  unconstrained  one  by  using  a  penalty 
function  method  [20,34].   That  is,  define  the  problem 

minimize     f(x)  =  f(x)  +  mQ(x)    ,  (4.3) 


where  Q(x)  is  a  penalty  function  for  the  inequality  constraints;  the 
penalty  function  is  defined  such  that  it  is  zero  whenever  the  point  x 
satisfies  all  the  constraints,  and  greater  than  zero  whenever  the  point 
X  does  not  satisfy  any  of  the  constraints.   The  constant  \i   is  a  positive 
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scalar.   For  large  \i,   the  minimum  of  (4.3)  will  tend  to  be  in  a  region 
where  the  constraints  should  be  almost  satisfied.   Thus  for  increasing 
M,  the  corresponding  solution  points  of  (4.3)  approach  a  solution  of 
(4.2)  [34].   Therefore,  the  penalty  method  converts  the  constrained 
problem  (4.2)  into  an  approximately  equivalent  unconstrained  problem, 
or  perhaps  to  a  sequence  of  unconstrained  problems  depending  on  how 
strictly  the  constraints  are  to  be  satisfied.   This  implies  that  uncon- 
strained minimization  algorithms,  in  particular  the  VO  algorithm,  may 
be  used  to  approximate  the  solution  of  (4.3). 

Two  difficulties  are  inherent  in  converting  problem  (4.2)  to 
problem  (4.3),  and  in  eventually  solving  problem  (4.3).   First,  to 
insure  the  global  convergence  of  the  VO  algorithm,  f(x)  must  be  twice 
continuously  dif ferentiable.   Therefore,  it  appears  that  penalty  func- 
tions must  be  chosen  accordingly.   Second,  problem  (4.3)  is  very  ill- 
conditioned  for  large  values  of  the  constant  \i    [34].   These  considera- 
tions will  be  explored  in  more  detail  next. 

A  suitable  penalty  function  Q(x)  is  given  by 

n 

Q(x)  =  I  w   (max[0,  q,(x)])^    ,  (4.4) 

^         i=l  ^i         i  '^ 


where  n  is  the  number  of  inequality  constraints,  and  the  constants 

w   may  be  used  to  equalize  the  magnitude  of  the  constraints.   The 

*^i 
continuity  conditions  on  f(x)  are  satisfied  by  (4.4),  if  the  inequality 

constraints  q .  (x)  ,  1  =  1,  ....  n  ,  also  satisfy  them.   In  the  litera- 
1  Oi  q 

ture,  the  most  popular  penalty  function  for  inequality  constraints  is 
given  by 


-86- 


^^  2 

Q(x)  =  I       w   (max[0,  q,(x)])    .  (4.5) 

^   i=l  H  "-  '^ 


This  quadratic  penalty  function  has  a  hessian  which  is  discontinuous 
whenever  an  inequality  constraint  is  zero  [34].   However,  it  was  found 
experimentally  that  for  several  problems  tested,  this  quadratic  penalty 
function  produces  an  unconstrained  problem  which  is  solved  by  the  VO 
algorithm  more  efficiently,  especially  when  the  hessian  is  not  supplied 
and  thus  approximated  by  differences,  than  by  using  the  cubic  penalty 
function  (4.4) .   Note  that  the  VO  algorithm  is  not  guaranteed  to  be 
globally  convergent  for  the  quadratic  penalty  function  (4.5)  because 
of  its  discontinuous  hessian. 

The  VO  algorithm,  as  other  existing  algorithms  [34],  solves  prob- 
lem (4,3)  for  large  y  with  great  difficulty,  as  examples  will  show. 
It  was  found  that  the  best  approach  was  to  compute  rough  approximations 
to  the  solution  of  a  sequence  of  problems,  given  by  (4.3),  for  increas- 
ing values  of  y ,  and  tightening  the  desired  accuracy  of  the  solution 
for  the  last  y  used.   Thus,  it  is  recommended  that  initially  a  small 
value  of  y  be  used  when  using  the  VO  algorithm  for  solving  problems 
such  as  (4.3) . 

4.2  Box  Constraints 
When  the  minimization  problem  has  box  constraints  of  the  form 


X  <  X  <  x    ,  (4.6) 


the  penalty  function  method  just  described  can  be  used,  if  (4.6)  is 
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rearranged  into  2n  inequality  constraints.   However,  there  are  two 
reasons  that  compel  the  use  of  a  different  technique  for  handling  the 
box  constraints.   First,  the  penalty  function  method  allows  the  viola- 
tion of  constraints,  particularly  when  the  multiplying  constant  y  is 
small.   In  circuit  optimization  procedures,  the  circuit  equations  may 
not  have  any  solution  if  any  of  the  box  constraints  are  violated,  and 
therefore  the  algorithm  may  fail  if  a  box  constraint  is  violated. 
Second,  the  box  constraints  are  linear  constraints  and  their  effect 
can  be  very  efficiently  handled  in  a  direct  manner  with  some  modifica- 
tions to  the  VO   algorithm. 

The  method  proposed  for  handling  the  constraints  is  to  in  effect 
project  the  transformation  function  trajectories  onto  the  active  box 
constraints  whenever  the  trajectories  are  outside  of  the  box  constraints. 
Figure  4.1  Illustrates  the  proposed  technique.   This  projection  can  be 
very  efficiently  implemented  as  will  be  shown. 

A  modification  required  in  the  implementation  of  the  projection 
of  the  transformations  is  that  whenever  the  trajectory  is  on  a  boundary, 
an  accurate  computation  of  the  solution  of  the  scalar  minimization 
problem  in  the  scalar  search  should  be  done.   The  reason  for  this  modi- 
fication is  that  when  transformation  functions  are  projected,  their 
theoretical  properties  may  be  different  when  the  scalar  parameter  p  in 
the  transformation  functions  is  set  to  one. 

4.3  Hessian  and  Gradient  Approximations 

To  this  juncture  the  VO  algorithm  was  described  in  a  way  that 
required  supplying  the  function,  the  gradient,  and  the  hessian  of  the 
function  to  be  minimized  at  the  points  of  the  sequence  {x  }  generated 


Figure  4.1  Illustration  of  projection  of  trajectory  onto  box  con- 
straints.  Box  constraints  are  depicted  by  the  rectangle. 
The  trajectory  is  shown  by  the  dash  curve  to  be  outside 
of  the  box  constraints  over  two  intervals.  The  actual 
trajectory  used  is  shown  by  the  solid  curve  with  arrows. 
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by  the  algorithm.   In  this  section  approximations  to  the  hessian  and 
the  gradient  will  be  presented  because  there  may  be  times  when  supply- 
ing the  gradient  and/or  the  hessian  is  impractical.   We  consider  two 
cases.  The  first  case  is  when  the  function  and  the  gradient  values  are 
supplied,  and  the  second  case  is  when  only  the  function  values  are 
supplied. 

4.3.1  Function  and  Gradient  Values  Supplied 

liJhen  the  function  and  the  gradient  values  are  supplied,  the 
hessian  needs  to  be  approximated.   It  was  initially  felt  that  a  quasi- 
Newton  scheme  [7,22],  which  builds  an  estimate  of  the  hessian  Inverse 
by  conjugate  directions,  would  be  an  ideal  approach.   These  methods 
have  proven  to  be  very  reliable  in  minimization  algorithms  [30-31], 
However,  the  approximation  to  the  hessian  inverse  is  not  sufficiently 
accurate  with  a  quasi-Newton  method  until  the  iterates  are  within  a 
very  small  neighborhood  about  the  solution.   For  example,  it  was  found 
experimentally  that  the  use  of  the  approximations  to  the  hessian  inverse 
1  r.  the  higher-order  transformations  when  far  from  the  solution  re- 
sulted in  no  improvement  over  the  algorithm  of  Fletcher  and  Powell  [22], 
Since  numerical  experiments  with  the  hessian  values  supplied  indicated 
that  the  VO  algorithm  was  more  efficient  than  existing  algorithms, 
approximating  the  hessian  by  difference  methods,  as  described  next,  was 
undertaken  with  good  results. 

Ortega  and  Rheinboldt  [38]  prove  that  as  long  as  the  hessian  is 
continuous,  non-singular,  and  satisfies  a  Llpschitz  condition,  a  dif- 
ference approximation  to  the  hessian  will  keep  the  high  order  of 
convergence  for  Newton's  iteration  if  the  perturbations  used  in  the 
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difference  approximations  approach  zero  in  the  limit.   For  example, 
the  approximation  to  the  hessian  given  by 


[f"(x^].  ^  (l/b^)[f' (x^  +  bV)  -  f'(xS],    j  =  l,  ...,  n, 

(4,7) 

maintains  the  high  order  of  convergence  for  Newton's  iteration  provided 
that  the  vector  of  the  perturbations  b.,  j  =  l,  ...,  n  satisfies 


b^ll  <  c  II  f'(x^)  II   ,    for  all  k  >  k-    ,  (4.8) 


where  k_  >^  0  and  c  is  a  constant  in  the  range  0  <  c  _<  1 .   The  relation 

1 1  k  1 1  "^ 

(4.8)  implies  that  ||b   |  should  become  small  as  the  solution  x  is 

approached.   More  details  on  the  technique  for  computing  the  pertur- 
bations will  be  given  later. 

t  k  I 
While  (4.7)  is  an  approximation  with  error  |b.|,  assuming  the 

third  derivative  of  f  is  bounded  [33],  the  diagonal  elements  of  the 

k  2 
hessian  may  be  approximated  with  error  on  the  order  of  (b.)   if  one 

assumes  that  the  function  value  is  evaluated  at  each  point  that  the 

gradient  is  evaluated  which  is  the  normal  case.   After  fitting  a 

Hermite  cubic  polynomial  [55]  on  b.  through  the  function  and  the  gra- 

1^ 
dient  computed  for  b.  =0,  and  b,  =  b.,  one  can  differentiate  twice 
J  J    J 

and  evaluate  at  b .  =  0  to  obtain 
J 


[f"(xS]..  ^  6[£U^   +  bV)  -  f(x^]/(bb^ 


-  2[f(x^  +  bV)]./b^  -  4[f'(xS]./b'!^  .    (4.9) 
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which  is  an  approxiraat: 

Lon 

with  error  on  the 

order  of 

(b^)^ 

assuming 

that  the  fourth  derivative 

of  f 

is  bounded  [55]. 

Since  the  hessian 

is 

symmet 

ric  for  the 

functions 

being 

considered, 

the  off-diagonal  elements 

may  be 

approximated  as 

the 

averag 

e  of  the 

two  elements  computed  from 

(A. 7) 

(note  that 

this 

averaging 

does  not 

require  any  additional 

gradient 

evaluations > 

.   Thus  a 

11  off 

-diagonal 

elements  (i  i^  j)  of  the 

.  hessian 

are  approximated 

by 

[f'Cx'^)]..  %  [f 

+  b% 

I.'Xj 

S]7(2bJ)  4 

[f 

x  + 

by)]./(2bS 

[f  (x^]./(2bh  -  [f'(x'')]./(2b'f)    .   (4.10) 


To  approximate  the  hessian  using  (4.9)  and  (4.10)  requires  n 
additional  function  and  gradient  evaluations  per  iteration  whether  we 
use  the  above  averaging  or  not.   Even  with  these  additional  evaluations, 
we  found  (as  will  be  seen)  the  VO  algorithm  to  be  very  competitive  with 
several  other  existing  algorithms  in  the  total  number  of  function  and 
gradient  evaluations  required  for  several  minimization  problems. 

The  perturbations  b.  used  in  (4.9)  and  (4.10)  have  two  normally 
conflicting  requirements.   First  they  must  be  sufficiently  small  in 
order  to  satisfy  (4.8)  and  to  produce  accurate  approximations.   Second, 
the  perturbations  must  be  sufficiently  large  in  order  to  avoid  round- 
off errors  in  the  differences  present  in  (4.9)  and  (4.10).   Therefore, 
the  simple  implementation  suggested  by  (4.8)  may  not  be  adequate.   For 
round-off  error  reasons,  it  is  proposed  that  each  perturbation  b.  be 
such  that 
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[f  (x^  +  b^)].  -  [f'(x^],|  ^  e+  z      |[f'(x^].|  ,   (4.11) 


where  e      >  0  and  e  >  0  are  constants  which  should  reflect  the  error 
a         r 

in  the  evaluation  of  the  gradient  terms  supplied  and  the  word  length 

-s/2 
of  the  computer.   If  the  evaluation  has  insignificant  error,  e  =  10    , 

-s/3 
and  f:  =  10     gave  good  results,  where  s  equals  the  number  of  signif- 

ican..  digits  in  the  numbers  of  the  computer.   The  case  where  errors 
are  present  in  the  supplied  function  and  gradient  values  will  be  con- 
sidered in  Section  4.4.   It  will  now  be  shown  that  (4.11)  may  be  used 
to  yield  a  desirable  perturbation  size.   The  Mean  Value  Theorem  [45] 
yields  the  relationship 


[f'Cx'"  +  b'je^)].  -  [f'(xS].  =  b^[f(x^  +  tby)]..  ,    (4.12) 
'\j     'x-     Jo.  2  '\j     %       2  J'\j'\^      J'\'1J 


for   t   e    (0,1).      Using   (4.11)    and   [f"(x        )]..    as   an  approximation   to 

"^     "^  22 

[f"(x^  +  tbV)]..  in  (4.12)  yields 
^     %  2'^       22 


b^'  ^  B^°  =  Le^   -f  eJCf'(x^].|]  /  |[f"(x^-SLj|.  k  >  0.  (4.13) 

This  expression  is  well  defined  in  some  neighborhood  of  a  strict  local 

minimum  since  the  gradient  terms  are  small  and  the  hessian  is  positive 

definite.   Furthermore,  for  sufficiently  small  e  ,  (4.13)  satisfies 

a 

k  D 

(4.8).   However,  when  x  is  far  from  a  solution  B.  may  not  exist  or  it 
'^  2 

may  be  very  large.   Also  a  different  expression  is  required  for  k  =  0. 
For  these  reasons  a  similar  argument  used  for  (4.11)  and  (4.12),  but 
this  time  for  the  function,  yields 


-93- 


b^  :^B^  =  [e  +  e  |f(x^|]  /  |[f(x^],|    ,         (4.14) 
for  a  desired  difference 

IfCx*"  +  bV)  -  f(x^  I  ;b  e  +  e  I  f(x^  I    .  (4.15) 

Certainly,  (4.14)  is  not  defined  whenever  a  gradient  term  is  zero,  but 
in  that  case  (4.13)  may  be  defined.   If  neither  (4.13)  nor  (4.14)  is 
defined,  a  third  option  is  given  by 


b^  %  B^  =  e  (1  +  Ix'^l)    .  (4.16) 


1^ 
The  computation  of  the  perturbation  b.  may  now  be  summarized  by  the 

following  expression 


e  +  minlB^,  b!"} 


b^  = 
J 


,   k  =  0    , 


(4.17) 


G  +  minfB^,  bT,  B?}    ,   k  >  0 


where  e   is  added  in  (4.17)  to  insure  a  non-zero  b.,  and  B.,  B.,  and 
B,  are  given  in  (4.13),  (4.14),  and  (4.16)  respectively.   Observe  that 
due  to  the  condition  (4.8)  one  would  not  expect  the  VO  algorithm  em- 
ploying (4.17)  in  the  approximation  of  the  hessian  to  converge  very 

rapidly  once  ||  f'(x  )  ||  <  e  .   However  this  expected  behavior  was  not 
'V  '\j       a 

in  total  agreement  with  the  numerical  experiments  on  several  functions 
tested.   In  fact,  the  accuracy  of  the  approximation  of  the  hessian  was 
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so  uniformly  excellent  throughout  the  iterations  that  in  general  the 

VO  algorithm  generated  very  similar  sequences  {x  }  to  those  generated 

a, 

using  supplied  hessians.   The  apparent  reason  for  the  improved  behavior 

is  the  increased  accuracy  of  the  hessian  elements  by  the  use  of  (4.9) 
and  (4.10)  instead  of  (4.7). 


4.3.2  Only  Function  Values  Supplied 

When  only  the  function  values  are  supplied,  both  the  hessian  and 
the  gradient  must  be  approximated  in  order  to  use  the  VO  algorithm.   As 
in  Section  4.3.1,  the  approximations  are  considered  first,  followed  by 
the  method  of  computing  the  perturbations. 

The  approximations  proposed  for  the  hessian  matrix  are  the  follow- 
ing.  The  diagonal  elements  of  the  hessian  are  approximated  by 


[f"(xS].,  ^  [f(/  +  b^)  -  2f(x^  +  f(/  -  by)]/(bh^   , 

(4.18) 

which  is  obtained  by  twice  differentiating  the  quadratic  in  b.  fitted 

k         k 
through  the  function  values  for  b.  equal  to  -b . ,  0,  and  b.;  (4.18) 

follows  for  b  =  0.   The  error  in  the  approximation  (4.18)  is  of  order 

|b.|,  assuming  the  third  derivative  of  f  is  bounded  [55].   In  order  to 

approximate  all  the  diagonals  of  the  hessian  using  (4.18),  additional 

2n  function  evaluations  are  required.   The  off-diagonal  elements  are 

approximated  by 


[f"(x'')],  .  ^  [f  (x\b^eJ+by)+f  (x^-f(x''+b^eJ)-f(Ab^eS]/(bV), 

(4.19) 
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whlch  is  an  approximation  with  error  in  the  order  of  |b. |  +  |b  ), 
assuming  the  third  derivative  of  f  is  bounded.   This  approximation  may 
be  obtained  from  a  combination  of  the  three  Taylor  series  expansions 
of  f{x^  +  h%^),    i(-x^  +  \Keh,    and  f(x^  +  b^  +  h%^) .      The  computa- 
tion  of  the  off-diagonal  elements  using  (4.19)  requires  one  additional 
function  evaluation  for  each  off-diagonal  element  after  considering 

those  function  values  already  available  from  the  approximation  of  the 

2 
diagonal  elements  from  (4.18);  thus  a  total  of  (n  -  n)/2  additional 

function  evaluations  is  required  to  approximate  all  the  off-diagonal 

elements.   Therefore,  the  entire  hessian  matrix  may  be  approximated, 

2 
using  (4.18)  and  (4.19),  with  (n  +  3n)/2  additional  function  evalua- 
tions. 

Each  element  of  the  gradient  vector  may  be  approximated  with  the 
central  difference  approximation  given  by 


[f'(x'')].  ^   [f(x^  +  b%h    -   f(x^  -  h^ehy(2hb        ,  (4.20) 


with  the  required  function  values  already  available  from  the  hessian 

approximation.   The  approximation  (4.20)  has  error  on  the  order  of 

k  2 
(b.)  ,  assuming  the  third  derivative  of  f  is  bounded  [55].   In  the 

computation  of  the  transformation  functions  the  gradient  is  required 

at  several  points  in  the  neighborhood  of  x  .   For  example,  the  approxi- 

mation  to  the  third-order  correction  (3.111)  requires 


f'(x^)  =  f'(h^(l))  =  f  (x^  -  db 


Using   (4.20)    to  estimate  the  elements  of   f'(x-)   would   require   2n 
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additional  function  evaluations.  However  without  sacrificing  too  much 
accuracy,  it  will  now  be  shown  that  only  n  additional  function  evalua- 
tions are  required.   First  rewrite  (4.20)  as  follows 


[f'(x'')3.  ^  [f(x^  +  bV)  -  f(x'')]/b'^  -  b'!^[f"(x'')]../2  .  (4.21) 


This  expression  is  identical  to  (4.20)  if  [f"(x  )]..  is  given  by  (4.18). 

'^     '^        23 

Observe  that  the  first  terra  in  (4.21)  is  the  forward  difference  approxi- 
mation to  the  gradient  term  [55]  which,  unlike  (4.18),  requires  only 
one  additional  function  evaluation.   The  approximation  (4.21)  is  the 

previously  known  [49]  quadratic  approximation  to  the  gradient  terms 

k    k  i 
which  may  be  derived  from  the  Taylor  series  expansion  of  f(x  +  b.e  ). 

Therefore,  for  the  approximation  to  the  elements  of  f ' (x-)  the  follow- 

ing  is  proposed 


[f  (xbl.  ^i   [f(x^  +  h^  eh    -   f(xb]/b^  -  b^  [f"(x'')]../2   , 

J  J     J 

(4.22) 

k  k 

where  [f"(x  )]..  is  given  by  (4.18).   If  [f"(x„)]..  were  used  instead, 
'h     %       :]2  'V  '\,2  jj 

k  2 
the  error  in  (4.22)  would  be  in  the  order  of  (b„  )  .   However,  since 

k    k    k  k  "^ 

x„  =  X  -  d„,  using  [f"(x  )]..  makes  the  error  in  (4.22)  on  the  order 
%l        'h         'hZ  "^     "^       22 

IK.    iC  t      1   Ic   1   1 1   Ic  1 1 
b„  b.|  +  |b   I  11  d  II  ,  assuming  the  third  derivative  of  f  is 

bounded.   Since  the  need  to  compute  f ' (x  )  arises  from  the  assumption 
that  11  d  II  is  small,  the  approximation  (4.22)  is  expected  to  have 
comparable  accuracy  with  the  computationally  more  costly  central  dif- 
ference approximation.   Numerical  experiments  have  indeed  verified 
this  expectation.   A  similar  approximation  is  used  for  all  other  gra- 
dients  required  in  the  neighborhood  of  x  . 
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The  attention  now  turns  towards  the  computation  of  suitable  per- 
turbations.  As  noted  earlier,  the  perturbations  must  be  computed 
taking  round-off  errors  into  consideration.   Additionally  it  is  desired 
to  have  the  perturbations  small  to  insure  small  errors  in  the  approxi- 
mations.  With  these  considerations,  it  is  proposed  that  each  pertur- 
bation  b.  be  such  that  (4.15)  and 


^(^''-^V>  -  ^^^''^i  =^^a-^^i^0i  (^-23) 


are  satisfied.   Using  (4.15)  and  (4.23),  from  (4.18) 


(b^'l^r^^^^^jjl  ^2C.^  +  .^|f(x^|]    .  (4.24) 


k  k— 1 

This   expression  may  be   solved   for  b.   using   [f"(x        )]..    to   obtain 


bnB^=   {2CE:^  +  cJfCx'^)|]/|[f-(x^-l)]..|}l/^   k>   0.(4.25) 


F 
In  some  neighborhood  of  a  strict  local  minimum,  B.  is  well  defined. 

F 
However,  far  from  a  solution,  B.  may  not  exist,  and  also  for  k  =  0 

another  approach  is  required.   Therefore,  define 


B^  =  e  +  e  min{|f(x^|   ,   |x^|}    ,  (4.26) 


k  k 

as  a  second  option  for  b..   Then  the  computation  of  b.  is  given  by  the 

following  expression 
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e  + 


e  +  B .  ,   k  =  0 

I  e  +  minlB^,  B^}   ,   k  >  0 


(4.27) 


Observe  that  £  may  have  to  be  smaller  than  for  the  preceding  case  when 

I   k  1 
the  gradient  and  the  function  were  supplied.   In  particular,  if  |f(x  )| 

is  very  large  for  all  k,  (4.25)  and  (4.26)  may  yield  a  large  value  for 

F      X 
B.  and  B.  if  £   is  not  made  sufficiently  small.   For  function  values 
J      J     r 

with  range  |f(x^)I  <  10^,    and  with  | f (x  ) |  £  10,  the  choice  e  =  5 x 10~^ 
gave  good  experimental  results.   Even  with  the  additional  function 
evaluations  required  for  the  proposed  approximations,  the  VO  algorithm 
compared  very  favorably  with  other  existing  algorithms  that  use  only 
function  values. 


4.4   Supplied  Function  and  Gradient  Values  with  Errors 

When  errors  are  present  in  the  evaluation  of  the  function  and/or 
the  gradient  values  supplied,  the  VO  algorithm  can  still  be  effectively 
employed,  if  the  magnitude  of  the  errors  is  not  too  large  and  can  be 
estimated,  and  if  some  modifications  are  made  to  the  algorithm. 

We  begin  by  assuming  that  each  supplied  gradient  component  has 
error  in  its  evaluation  given  by 


|[f(x^)].  -  [f'(x^].|  <  e^^  +  e^^|Cf'(x^]  I    ,      (4.28) 


k  —   k 

where  f'(x  )  is  the  exact  gradient,  f'(x  )  is  the  supplied  gradient, 
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e   is  the  absolute  error,  and  e   is  the  relative  error.   Similarly, 
each  supplied  function  value  has  error  given  by 


|f(x^)  -  f(x^|  <  E.  +  £. Jf(0(    .  (4.29) 

'\,      "Vi    —  ra    tr   'v^ 

where  e^  ,  and  e^  are  the  absolute  and  relative  errors,  respectively, 
fa       fr 

In  computer-aided  optimization  of  circuits,  the  absolute  and 
relative  errors  present  in  the  supplied  function  and  gradient  evalua- 
tions can  usually  be  estimated.   For  example,  in  dc  optimization,  the 
absolute  and  relative  errors  are  normally  related  to  the  absolute  and 
relative  constants  used  in  determining  convergence  to  the  solution  of 
the  nonlinear  circuit  equations.   That  is,  the  function  and  the  gra- 
dient values  depend  on  the  solution  of  nonlinear  equations.   In  solving 
the  nonlinear  equations,  an  iterative  method  is  employed  which  stops 
when  the  difference  between  two  successive  iterates  is  less  than  an 
absolute  constant  plus  a  relative  constant  times  the  magnitude  of  the 
iterate. 

There  are  two  modifications  which  Improve  the  effectiveness  of 
the  VO  algorithm  for  problems  that  have  errors  given  by  (4.28)  and 
(4.29).   First,  the  perturbations  used  in  the  difference  approximations 
to  the  hessian  and  the  gradient  should  be  computed  considering 
the  errors  present.   Second,  If  the  transformation  functions  generate 
trajectories  which  yield  little  or  no  decrease  in  the  function  value 
at  a  particular  iteration,  an  alternate  approach  should  be  used. 

The  perturbations  used  in  the  difference  approximations  to  the 
hessian  were  derived  in  the  preceding  section.   For  round-off  reasons, 
the  perturbations  b.  used  in  the  approximations  are  computed  such  that 
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whenever   the  gradient  values   are  supplied.      With  errors  present  in   the 
evaluation  of   the  gradient,    it   is  desired   that 

|Cf'(x''+  bV)].   -   [f'(x'')].|    »   e       +  E      |[f'(x^].|,    (4.31) 

and  that  the  error  in  the  diagonal  element  of  the  approximate  hessian 

be  small.   The  expression  (4.31)  may  be  accomplished  by  initially 

setting  e  =  200  e   ,  and  e  =  200  e   in  (4.30).   The  values  of  e 
a        ga       r        gr    ^    '  a 

and  e  are  then  adjusted  after  each  iteration  if  the  estimated  error 
r  -^ 

in  the  diagonal  element  of  the  hessian  is  too  large.   Each  diagonal 
element  of  the  hessian  is  approximated  by 

[f"(x^]..  =  ([f'(x''  +  b'feJ)].  -  [f'(x^].)/b'!'   ,      (4.32) 

instead  of    (4.9)   because   (4.32)    is  less   sensitive   to   the  errors  in   the 
supplied   gradient.      The  perturbation  b.    in   (4.32)    is  a   function  of   e 
and   e     as   described   in   the  preceding  section.      The  error  in   (4.32)    due 
to    (4.28)    is   less   than  E f[f "(x^) ] . .) ,   where 

E([f'(xS]..)    =    (2e       +  e^    (|[f'(x^+bV)].|  +  |[f'(x^].|))/b'f  . 


Now,    if 


EfCf"(x^].  .]    >5xlG-'  +  5x  10~^|[f(x^)]., 
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it  is  concluded  that  the  perturbation  used  was  too  small  since  the 

error  in  the  approximated  diagonal  element  may  be  large.   Therefore, 

e   is  set  to  IOe  ,  and  e   is  set  to  2e  .   On  the  other  hand,  if 
a  a       r  r 


EfEfCx^)]..)  <  5  X  10  ^  +  5  X  10  ^|[f"(x^]..| 


then  the  perturbation  used  was  too  large  since  the  estimated  error  in 

the  approximation  is  small.   In  this  case,  e   is  set  to  10  e  ,  and 

e   is  set  to  e  /2.   This  procedure  monitors  the  maximum  error  in  the 
r  r         ^ 

diagonals  of  the  hessian  at  each  iteration  and  in  effect  adjusts  the 
perturbations  in  an  attempt  to  keep  them  small  provided  that  the  error 
is  not  large.   When  only  function  values  are  available,  a  similar 
technique  is  implemented. 

The  off-diagonal  elements  of  the  hessian  may  also  have  significant 
error.   It  was  observed  in  numerical  experiments  that  averaging  the 
two  values  of  the  (i,j)  and  (j,i)  off-diagonal  element  approximations 
as  proposed  in  (4.10)  did  not  work  sometimes.   For  example,  the  (i,j) 
approximation  may  be  several  orders  of  magnitude  larger  or  smaller  than 
the  (j,i)  approximation.   Whenever  this  large  difference  occurs,  it  is 
believed  that  it  is  best  to  set  the  (l,j)  off-diagonal  element  to  the 
smaller  of  the  two  approximations. 

The  second  modification  to  the  VO  algorithm  stems  from  the  fact 
that  it  is  possible  that  the  trajectory  generated  by  the  transformation 

function  selected  does  not  reduce  the  value  of  the  function  being 

It 
minimized,  even  when  the  current  point  x  is  far  from  a  solution. 

That  is, 
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f(h5^(p))  <  f(xS  ,   r  =  2,  3,  or  4   , 


may  not  be  satisfied,  or  may  be  satisfied  for  a  very  small  value  of 
p  >  0,  which  would  yield  x    very  close  to  x  .   In  effect,  this  implies 
that  the  transformation  functions  may  yield  poor  trajectories  when 
large  errors  are  present  in  the  function  and  the  gradient.   Therefore, 
when  the  progress  at  an  iteration  is  small,  the  gradient  direction  is 
used  or  cyclic  coordinate  searches  are  undertaken  until  another  point 
sufficiently  far  away  is  obtained,  or  it  is  assumed  that  the  algorithm 
has  converged,  if  another  point  cannot  be  found  which  reduces  the  value 
of  the  function. 

4.5  The  Variable-Order  Algorithm 

The  steps  of  the  new  algorithm  may  now  be  summarized.   The  algo- 
rithm will  be  divided  into  four  major  steps:   INITIALIZATION,  TRANS- 
FORMATION, SCALAR  SEARCH,  and  CONVERGENCE  TEST.  Within  each  of  these 
steps,  several  sub-steps  will  be  identified.   A  computer  program  which 
incorporates  this  algorithm  written  in  FORTRAN  IV,  is  described  in 
Appendix  I.   This  program  also  includes  three  other  popular  algorithms 
as  options.   In  this  section  we  will  only  be  concerned  with  those 
aspects  of  the  program  that  pertain  to  the  VO  algorithm.   We  will 
refer  to  the  variables  and  subroutine  names  used  in  the  program  to  aid 
in  its  understanding. 

STEP  1;   INITIALIZATION 

This  step  consists  of  setting  or  obtaining  from  the  user  of  the 
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VO  algorithm  several  constants  and  options  which  will  be  required  in 
the  other  steps  of  the  algorithm.   This  step  is  executed  only  once  and 
it  is  basically  the  MAIN  program.   The  constants  and  options  are  the 
following: 

MAXIT  (default  is  50) ,  the  maximum  number  of  iterations  to  be  done 
so  the  algorithm  will  stop  even  if  it  has  not  satisfied  the 
convergence  test. 
STPEPS  (default  is  10  ),  the  algorithm  considers  stopping  when 
the  maximum  coordinate  of  the  gradient  is  less  than  STPEPS 
in  absolute  value.   That  is,  if  11  f '  (x  )  11   <  STPEPS,  and 
other  conditions  to  be  given,  it  is  assumed  that  the  algo- 
rithm has  converged. 
MAXAV  (default  is  2) ,  to  indicate  how  much  information  is  supplied 
about  the  function  to  be  minimized.   It  can  have  three 
values:   MAXAV  =  1  indicates  that  only  the  function  values 
are  supplied;  MAXAV  =  2  indicates  that  both  the  function  and 
gradient  values  are  supplied;  MAXAV  =  3  indicates  that  the 
function,  the  gradient  and  the  hessian  values  are  all  supplied. 
PHABS  (default  is  STPEPS  x  10~  )  is  a  constant  used  in  approxi- 
mating the  hessian  when  both  the  gradient  and  the  function 
values  are  supplied  (MAXAV  =  2)  as  described  by  e  in  Section 

4.3. 

-4 
PHREL  (default  is  10  )  is  a  constant  also  used  in  approximating 

the  hessian  (for  MAXAV  =  2)  as  described  by  e   in  Section  4.3. 

GFABS  (default  is  0.0)  is  the  absolute  value  of  the  error  present 

in  the  gradient  values  supplied  as  described  in  Section  4.4 

by  e   .   If  it  is  nonzero,  STPEPS  is  set  to  2*GFABS,  and 
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PHABS  is  set  to  200AGFABS. 
GFREL  (default  is  0.0)  Is  the  relative  value  of  the  error  present 

in  the  gradient  values  supplied  as  described  in  Section  4.4 

by  e   .   If  it  is  nonzero,  PHREL  is  set  to  2G0*GFREL. 
PABS  (default  is  STPEPS  x  10  )  is  a  constant  used  in  approximating 

both  the  gradient  and  the  hessian  when  only  function  values 

are  supplied  (MAXAV  =  1)  as  described  in  Section  4.3  by  c  . 

—ft 
PREL  (default  is  5  x  10   )  is  also  a  constant  used  in  approximating 

both  the  gradient  and  the  hessian  (MAXAV  =  1)  as  described  in 

Section  4.3  by  e  . 
-^   r 

FABS  (default  is  0.0)  is  the  absolute  value  of  the  error  present 
in  the  function  values  supplied  as  described  in  Section  4.4 
by  e   .   If  it  is  nonzero,  STPEPS  is  set  to  2*FABS,  and  PABS 
is  set  to  200*FABS. 

FREL  (default  is  0.0)  is  the  relative  value  of  the  error  present 

in  the  function  values  supplied  as  described  in  Section  4.4 

by  G^  .   If  it  is  nonzero,  PREL  is  set  to  200*FREL. 
fr 

-2 
RELSCH  (default  is  5  x  10  )  relative  accuracy  of  the  minimizing 

scalar  search  when  coordinate  searches  are  undertaken  or 

when  the  gradient  direction  is  used. 

BOX  (default  is  no  box  constraints)  is  a  two-dimensional  array 

which  may  optionally  contain  the  box  constraints.   BOX(l,i), 
i  =  1,  ...,  n,  may  be  set  to  the  lower  limits  of  each  inde- 
pendent variable.   B0X(3,i),  1=1,  ...,  n,  may  be  set  to 
the  higher  limits  of  each  independent  variable. 

NX,  the  dimension  of  the  problem  (n  in  all  of  the  preceding  de- 
velopment) . 
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X,  (also  called  XK  in  some  of  the  subroutines)  an  array  containing 
the  initial  guess  x   to  a  solution  of  the  minimization  prob- 
lem with  perhaps  box  constraints  (nonlinear  constraints 
require  that  they  be  manually  handled  by  the  penalty  method 
described  in  Section  4.1). 


Thus  the  user  needs  to  provide  NX  and  X  only  since  all  other  constants 
and  options  have  default  values.   In  addition,  the  user  must  write  a 
subroutine  which  supplies  the  function,  the  gradient,  and  the  hesslan 
values  (depending  on  the  setting  of  MAXAV)  at  a  given  point  x  .   Ex- 
amples  of  how  this  subroutine  may  be  written  are  given  in  Appendix  I. 
This  step  is  completed  by  initializing  several  counters  and  evaluating 
the  function  and  the  gradient  at  x  .   The  counters  keep  track  of  how 
many  function  evaluations,  gradient  evaluations,  and  hessian  evalua- 
tions have  been  required,  and  of  how  many  times  a  given  transformation 
order  has  been  used.   While  these  counters  are  useful,  they  are  not 
essential  to  the  algorithm,  and  thus  their  updating  will  not  be  shown. 
The  only  counter  which  is  required  is  the  iteration  counter.   In  the 
program  described  in  Appendix  I  this  counter  is  denoted  by  ITCNT. 
However,  for  notational  convenience  the  letter  k  will  be  used  here. 
Also  for  notational  convenience,  while  for  example  f'(x  )  is  stored  in 
array  GF  in  the  program,  the  mathematical  notation  will  be  used  here. 
The  program  has  comments  however,  which  identify  the  arrays  that  are 
used.   Thus  this  step  ends  with  the  following  two  sub-steps. 

1.  Set  k  =  0. 

2.  Compute  f(x  )  and  f'(x  ).   The  gradient  f'(x  )  is  approximated 
if  MAXAV  =  1  (see  Sections  4.3  and  4.4).   The  subroutine  that 
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takes  care  of  calling  for  the  evaluations,  and  also  the  approx- 
imation  to  f'(x  )  when  not  supplied  is  called  OPGRAD. 


STEP  2:   TRANSFORMATION 


This  step  is  executed  once  for  each  iteration  and  the  dominant 
subroutine  is  called  OPXKPl.   Its  purpose  is  to  compute  the  high-order 
corrections  and  to  determine  the  order  to  use  at  the  k   iteration. 
The  following  sub-steps  may  be  identified. 

1.  Compute  the  hessian  f"(x  ).   The  hessian  is  approximated  if 
MAXAV  5^  3  (see  Sections  4.3  and  4.4).   In  the  program  of 
Appendix  I,  the  hessian  is  stored  in  array  G2F,  and  only  the 
un-symmetric  part  is  stored.   The  storage  scheme  may  be  thought 
of  as  recording  each  row  sequentially  in  a  one-dimensional 

array  beginning  each  row  with  the  diagonal  element.   This 

2  2 

technique  requires  n  +  n  /2  storage  locations,  instead  of  n 

if  the  entire  hessian  is  stored.   The  subroutine  which  takes 

care  of  calling  for  the  evaluation  or  approximation  of  the 

hessian  is  called  OPHESS. 

2.  Factorize  the  hessian.   This  step  consists  of  computing  the 
permutation  matrix  P,  the  diagonal  matrix  D  ,  and  the  upper 
triangular  matrix  U  such  that  P[f"(x  )  +  D  ]P^  =  U^U  as  de- 
scribed  in  Section  3.5.   The  permutation  matrix  P  is  most 
efficiently  recorded  in  a  vector  of  length  n  as  done  by  sub- 
routine SFAC  in  the  program  of  Appendix  I.   Observe  that  the 
columns  of  P  are  permutations  of  the  columns  of  the  unit  dia- 
gonal  matrix,  and  therefore  a  single  integer  is  sufficient  to 
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identify  each  column  of  P.   Also  the  entire  matrix  D  is  not 
required,  rather  the  norm  ||  D  ||^,  which  is  the  maximum  diag- 
onal element,  is  only  needed.   Finally  in  the  program,  U 
replaces  the  storage  occupied  by  the  hessian.   Since  the 
diagonal  elements  of  the  hessian  are  required  whenever  the 
gradient  and  the  hessian  have  to  be  approximated,  another 
array  of  length  n  is  used  to  store  the  diagonal  elements  when 
needed  by  the  next  iteration. 

3.  Compute  the  second-order  correction  d„.   The  second-order 

'v2 

correction  Is  the  solution  of  the  linear  system  of  equations 
given  by  (3.105).  The  solution  is  very  efficiently  obtained 
if  the  permutation  matrix  is  stored  in  a  vector  as  described 
in  2  above,  since  indirect  indexing  can  be  used,  as  done  by 

subroutine  SFBSUB  in  the  program  of  Appendix  I. 

k    k       k    k. 

4.  Compute  x-  =  h„(l)  =  x  -  d_.   This  computation  is  done  by 

function  OPEVX  in  order  to  efficiently  handle  the  box  con- 
straints. 

k         k 

5.  Compute  f(x„)  and  f'(x„).   If  the  gradient  is  not  supplied, 

see  Sections  4.3  and  4.4  and  subroutine  OPGRAD  for  the  approx- 
imation technique. 

II     k   II  k+1     k 

6.  If   f'(x-)    <  STPEPS,  set  x    =  x^ ,  and  go  to  STEP  4, 

Otherwise  continue. 

k       k 

7.  If  f(x  )  <  f(x„),  set  order  r  =  2,  and  go  to  STEP  3,  otherwise 

continue. 

8.  Compute  the  third-order  correction  d„.   The  third-order  cor- 
rection  is  approximated  by  the  solution  of  the  linear  system 
(3.111).   The  same  comment  as  in  3  above  applies  to  this  sub- 
step. 
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k    k       k    k    k 
9.   Compute  ^3  =  ^3(1)  =  ^  ~  ^2  ~  ai3*   ^^^  ^  above  for  an  appli- 
cable comment. 

k         k 

10.  Compute  fCx^)  and  f'(x  ).   See  5  above  for  an  applicable 

comment . 

11.  If  II  f'uh   II   <  STPEPS,  set  x''"^^  =  x^,  and  go  to  STEP  4, 

Otherwise  continue. 

k  k 

12.  If   f(^3)  ^  f(x2),   set  order  r  =   2,    and  go   to   STEP  3,    otherwise 

continue. 

13.  Compute  the  fourth-order  correction  d,.   The  fourth-order  cor- 
rection  is  approximated  by  the  solution  of  (3.112).   The  same 

comment  in  3  above  is  applicable  here. 

k    k       k    k    k    k 

14.  Compute  x  =  h  (1)  =  x  -  d„  -  d.  -  d, .   See  4  above  for  an 

applicable  comment. 

15.  Compute  f(x^)-   Entry  OPFUNC  in  subroutine  OPGRAD  handles  the 

calling  of  the  user's  subroutine  to  obtain  function  values. 

k       k 

16.  If  f(x,)  >  f(x„),  set  order  r  =  3,  otherwise  set  order  r  =  4. 

Go  on  to  STEP  3. 


STEP  3:   SCALAR  SEARCH 


The  scalar  search  step  was  described  in  detail  in  Section  3.4.   It 
consists  of  computing  a  suitable  value  of  the  scalar  parameter  p  in  the 

transformation  function  h  (p)  corresponding  to  the  order  r  selected  in 

'\ir 

STEP  2.   The  transformation  functions  are  listed  in  (3.53).   The 
vectors  that  are  the  coefficients  of  the  polynomial  in  p  in  the  trans- 
formation functions  are  stored  in  a  two-dimensional  array  called  C. 
The  actual  evaluation  of  h  (p)  for  any  value  of  p  is  done  in  the  func- 
tion OPEVX  where  the  box  constraints  are  handled  by  the  projection 
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method  described  in  Section  4.2.   The  following  sub-steps  may  be  iden- 
tified. 


1.  If  order  r  >  2  go  to  3,  otherwise  continue. 

2.  Compute  p  =  p,  for  the  second-order  transformation.   The 
subroutine  which  handles  this  case  is  called  0P0RD2.   Note 

that  whenever  the  second-order  transformation  is  selected,  x 

% 

is  far  from  a  solution  of  the  minimization  problem  due  to  the 
manner  in  which  the  order  is  determined.   That  is,  the  in- 
finite series  (3.33)  for  which  p  =  1  converges,  and  therefore 

the  tests  in  sub-steps  7  and  12  of  STEP  2  will  both  be  false 

k  * 

when  X  is  close  to  x  .   Thus  the  value  p,  for  the  scalar 

parameter  is  computed  as  described  in  Section  3.4.2.   Go  to  6. 

II     k     II 

3.  If   f'(h_(l))    >  1,  go  to  5,  otherwise  continue. 

k  * 

4.  Compute  p  =  p,  for  x  close  to  x  .   The  value  p,  is  given  by 

k     Tj  "V/  k 

p,  ^   p,  which  is  a  solution  to  the  scalar  minimization  problem 
k    k 

(3.56)  as  described  in  Section  3.4.1.   The  subroutine  which 

handles  this  case  is  OPORDH.   Go  to  6. 

k  * 

5.  Compute  p  =  p,  for  x  far  from  x  and  r  >  2.   This  case  is 

k     a,  o, 

also  handled  by  subroutine  OPORDH.   The  parameter  p   is  com- 
puted to  approximate  a  solution  of  (3.60)  as  described  in 
Section  3.4.2. 

6.  Set  x^"*"^  =  h^^(p,  )  and  go  to  STEP  4.  Actually,  both  x^"*"^  and 
p  are  returned  by  subroutines  0PORD2  and  OPORDH  in  the  pro- 
gram described  in  Appendix  I. 


STEP  4:   CONVERGENCE  TEST 


The  purpose  of  this  step  is  to  check  for  convergence  or  for  an 
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excessive  number  of  iterations  in  order  to  stop  the  algorithm.   Except 

for  the  first  sub-step  below  which  is  done  in  subroutine  OPXKPl,  the 

logic  of  this  step  is  in  the  MAIN  program.  The  sub-steps  may  be  de- 
scribed as  follows. 

1.  Compute  f(x   )  and  f'(x   ) ,  if  not  already  evaluated.   The 
comment  in  sub-step  2  of  STEP  1  is  applicable  here. 

2.  If  II  f'Cx^'^^  IL  1  STPEPS  and  ||  D*"  ||^  =  0   (  ||  D^  ||^  is  stored 
in  the  variable  called  POS  by  subroutine  SFAC) ,  the  algorithm 

•k 

is  assumed  to  have  converged  to  an  approximation  of  x  .   Go 
to  11. 

3.  If  II  f'Cx^'^^  II   <  STPEPS,  go  to  6. 

4.  If  IfCx''^^  -  f(xS|  >  STPEPS  +  |f(x^|  *  (STPEPS)^^^  go  to  9. 

5.  If  IxJ"^^  -  xjl  >  STPEPS  +  |xj|  *  (STPEPS)^^^  for  any  i  =  1,  . .  .  , 
n,  then  go  to  9. 

6.  If  coordinate  searches  have  already  been  undertaken,  then  it 
is  assumed  that  convergence  has  occurred.   Tests  4  and  5  above 
are  especially  necessary  when  the  supplied  function  and  gra- 
dient have  errors,  since  in  that  case  the  gradient  may  not  be 
small  at  the  point  of  solution  due  to  its  absolute  errors.   Go 
to  11. 

7.  Set  k  =  k  +  1. 

8.  Undertake  coordinate  searches.   Subroutine  OPCOOR  handles  the 
selection  of  coordinate  directions,  and  OPBDRY  handles  the 
scalar  minimization  problem  along  the  selected  coordinate.  As 
discussed  in  Section  3.6,  coordinate  searches  provide  some 
insurance  that  the  point  x  is  a  local  minimum.   Thus  if  co- 
ordinate  searches  find  no  function  decrease  along  any  of  the 
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coordinate  directions,   go  to  11.      Otherwise  the  coordinate 

k+1 
search  yields  another  x         and  go   to  1    (in  this   STEP). 

9.      Set  k  =  k  4-  1. 

10.  If  k  £  MAXIT,  go  to  STEP  2,  otherwise  continue. 

11.  Done.   If  k  >  MAXIT,  the  algorithm  did  not  converge.   Other- 

k  * 

wise  the  point  x  should  be  an  approximation  of  x  ,  a  solution 

of  the  minimization  problem  with  perhaps  box  constraints. 


A  complete  iteration  on  an  unconstrained  minimization  problem  will 
now  be  given  to  illustrate  the  VO  algorithm.  The  problem  to  be  used  is 
given  by 


minimize   f(x^,  x^)    =  100(x2  -  x^)^  +  (1  -  x  )^ 


This  problem  was  proposed  by  Rosenbrock  [44]  and  it  has  a  parabolic 
valley  which  is  extremely  narrow.   This  test  problem  is  widely  used  in 
the  literature  and  most  existing  algorithms  have  been  compared  on  this 
problem.   The  usual  starting  point  is 


0  T 

x""  =  (-1.2,  1)' 


and 


f(x°)  =  24.2 


f'(x°)  =  (-215.6,  -88)"^ 


and 


f"(x°)  = 


1330    480 
480    200 
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The  only  minimum  point  for  this  function  is  at  x  =(1,1),  and 

*  0  0 

f(x  )  =  0.   The  factorization  of  f"(x  )  requires  no  pivoting  and  D  =  0 

in  the  factorization  (3.95).   The  U  matrix  is  given  by 


36.4692 


13.1618 


5.1737 


Computing  d„  from  (3.105)  yields 


d^  =  (-.024719, 


-.3807) 


Thus  the  second-order  transformation  from  (3.53a)  is  given  by 


ft^p)  = 


-1.2 


-  P 


-.024719 


-.3807 


(4.33) 


Proceeding 


f(h5(l))  =  f(x°  -  d?)  =  4.731f 


and 


f'(h,(l))  =  (-4.6378. 


-.1222) 


Thus  since  f(h„(l))  <  f(x  ),  the  third-order  correction  is  computed 
'\iZ.  '\, 


from  (3.111)  to  be 
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d"  =  (-.024407, 


.057967) 


which  gives  the  third-order  transformation  from  (3.53b)  by 


h°(p)  = 


-1.2 


-.0370785 


.57105 


-.0120475 


.248317 


.  (4.34) 


Proceeding 


f(h°(l))   =   f(x°  -  d?  -   d°)   =  4.62658 
0/3  Oi  ^l        OiJ 


and 


f (h°(l))   =    (-5.1315, 


3605) 


Thus  since   f(h   (1))    <   f(h_(l)),    the  fourth-order  correction   is   computed 
from   (3.112)    to  be 


d°  =   (-.023968,      .055721)^ 
1.4 


which  gives  the  fourth-order  transformation  from  (3.53c)  by 


!^°<P>  = 


-1.2 


-  P 


-.045318 


-.6979 


-  P 


-.024096 


.4966 


-.0036806 

-.065691 
(4.35) 


and  since 
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f(h°(l))  =  4.5246  <  f (113(1)) 


the  fourth-order  transformation  is  selected.   All  that  remains  now  is 
to  compute  the  value  of  p  to  be  used  in  (4.35)  using  the  techniques 
outlined  in  Section  3.4  which  are  illustrated  next.   Since 


f'(h°(l))  IL   >  1 


it  is  concluded  that  x  is  far  from  the  solution.   The  derivative  of 
a. 

each  component  of  h  in  (4.35)  with  respect  to  p  set  equal  to  zero 
yields  the  following  two  polynomials 


,045318  +  .048192  p  +  .0110424  p^  =  0 


and 


.6979  -  .9932  p  +  .197073  p^  =  0 


The  first  polynomial  has  complex  zeros  implying  that  the  coordinate  x^ 
moves  away  from  x,  for  all  p  >  0  as  p  is  increased.   The  second  poly- 
nomial has  zeros  given  by 


p    =  ,844,   and  p. 2)  =  4.1957 
Equation  (3. 75), which  is  f'(x°)\°'(p)  =  0,  becomes 


-71.1858  +  77.0114  p  -  19.7232  p  =  0 


which  yields  the  two  zeros 


-115- 


p^2^  =  1.502,   and  p^^^  =  2.402 


The  largest  zero  is  4.1957,  which  yields 


h°(4.1957)  =  (-.3138,  .03796)'^ 


and 


f(}i^(4.1957))  =  2.092 


which  since  it  is  even  less  than  f(h,(l))  this  value  of  p  satisfies 
(3.60b)  for  any  c   >  0.   Therefore,  the  iteration  is  complete,  that  is 
Pq  =  4.1957,  and 


x^  =  h^(4.1957)  =  (-.3138,  .03796)^ 


Figure  4.2  shows  the  x  ,  x  plane  with  equi-contours  of  Rosenbrock's 

function  shown  by  the  dash  curves.   Note  the  three  curves  emanating  from 

0  T 

^  =  (-1.2,  1)  .   Each  of  these  curves  corresponds  to  the  trajectories 

generated  by  the  three  transformations  (4.33),  (4.34)  and  (4.35)  as  p  is 
increased.   Observe  that  all  transformations  give  x  for  p  =  0,  but  that 
as  p  is  increased  they  follow  different  curves.   Observe  the  superiority 
of  the  fourth-order  transformation.   Note  that  the  proposed  scalar 
search  procedure  selected  a  p  which  made  considerable  progress,  yet  it 
needed  only  one  additional  function  evaluation.   In  Fig.  4.3  the  func- 
tion f(h, (p))  is  plotted  as  a  function  of  p.   Observe  that  had  a 
standard  scalar  search  been  used  which  brackets  the  minimum,  it  would 
have  probably  computed  Pq  'li  2  which  is  of  course  not  as  good  as 
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(a) 


Figure  4.2  Trajectories  for  the  second-,  third-,  and  fourth-order  trans- 
formations at  x^  =  (-1.2,  1)^  for  the  minimization  of 
Rosenbrock's  function  (4.36),  (a)  projected  onto  the  X2  vs. 
Xi  plane,  and  (b)  a  3-dimensional  perspective  view  with  the 
"eye"  at  f  =  0,  xj  =  -1.3,  X2  =  -.2  with  the  vertical  axis 
being  f. 
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SECON 
ORDER 


(b) 


Figure  4.2  (Continued) 
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Figure  4.3  Plot  of  fCb^Cp))  vs.  p,  where  f  is  Rosenbrock's  function 
(A. 36),  and  Jj^(p)  is  given  by  (4.35).  Note  that  the  pro- 
cedure outlined  for  the  scalar  search  selects  p  =  p^ 
4.1957  shown  in  the  graph. 


^0 
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p^  =  4.1957.  These  local  minima  close  to  p  =  0  in  the  scalar  search 
occurred  frequently  with  the  higher-order  transformations,  which  is 
another  justification  for  the  proposed  scalar  search  technique. 

4.6  Numerical  Results 

In  this  section  the  results  of  using  the  VO  algorithm  for  solving 
several  test  problems  will  be  presented.   The  first  five  problems 
chosen  are  unconstrained  minimization  problems  that  are  widely  used 
in  the  literature,  and  thus  the  performance  of  the  VO  algorithm  is 
compared  with  other  popular  algorithms  using  published  results.   Then 
several  other  test  problems  are  presented  which  illustrate  each  of  the 
practical  aspects  of  the  circuit  optimization  problem  described  in  this 
chapter. 

All  of  the  computer  runs  were  done  in  an  IBM/370  Model  155  com- 
puter using  the  MVT  system  software  package.   Unless  noted,  the  default 
values  for  all  the  constants  used  by  the  algorithm  given  in  Section  4.5 
are  always  used.   Several  abbreviations  are  used  in  summarizing  the 
results.   To  avoid  repetition  these  abbreviations  will  be  explained 
here.   They  are 

No.  ITN  —  the  number  of  iterations. 

No.  FUN  —  the  number  of  function  evaluations. 

No.  GRAD  —  the  number  of  gradient  evaluations. 

No.  HESS  —  the  number  of  hessian  evaluations. 

ORDER  —  the  order  of  the  transformation  function  selected. 

X  —  the  exact  solution  of  the  minimization  problem  being  solved. 

X  —  the  approximation  computed  by  the  algorithm. 
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4.6.1  Rosenbrock's  Problem 

Rosenbrock's  problem  [44]  was  introduced  earlier  to  illustrate 
one  iteration  of  the  VO  algorithm.   The  problem  is  given  by 


minimize   f(x)  =  lOOCx^  -  xj)^  +  (1  -  x  )^    ,         (4.36) 

X 


with  initial  values  given  by 


x°  =  (-1.2,  1)^,   and   f(x°)  =  24.2 


and  with  minimum  values  given  by 


*        T  * 

X  =  (1,  1)  ,   and  f(x  )  =  0 


Table  4.1  summarizes  the  results  of  using  the  VO  algorithm  to  solve 
this  problem  with  three  different  values  of  the  option  MAXAV.   Observe 
that  while  the  algorithm  generates  almost  identical  sequences  when  far 
from  the  solution  regardless  of  the  value  of  MAXAV,  in  the  vicinity  of 

A 

X     the  more  information  about    the  function   that   is   supplied   the  better 
the  convergence  behavior,    a  property  which  was   expected.      These   results 
also  indicate  that   the  higher-order  transformation   functions  were  selec- 
ted in  most   iterations.      Figure  4.4   shows   the  entire   trajectory   in   the 
x„  vs.    X,    plane  from  the  initial  point  x    ,    to  x     which  is  in   the  nelgh- 
borhood  of  x   .      The  plot  is   for  MAXAV  =   3,    and  one   equi-contour  of   the 
function  is   shown  in  dash  curves.      Each  x    ,    from  k  =  0   to  k  =   5,    com- 
puted  by   the  algorithm  is  noted   in  the   figure.      The   trajectory  between 
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TABLE  4.1  Results  for  Rosenbrock's  problem  with  (a)  the  function, 

the  gradient,  and  the  hesslan  values  supplied  (MAXAV  =  3), 
(b)  the  function  and  the  gradient  values  supplied  (MAXAV  = 
2),  and  (c)  only  the  function  values  supplied  (MAXAV  =  1). 
Note  that  the  zero  given  for  the  eighth  iteration  in  (a) 
was  actually  a  printed  result.   The  computer  times  required 
were  .04  seconds  for  (a),  .04  seconds  for  (b) ,  and  .053 
seconds  for  (c) . 


(a) 


No. 
HESS 

ORDER 

II  x^-x*  II 

f(x^-f(x*) 

k 

No.   No. 
FOTJ   GRAD 

II !'(?'')  Il» 

0 

1     1 

0 

- 

2.2 

24.2 

215.6 

1 

6    4 

1 

4 

1.314 

2.0921 

12.1 

2 

11     7 

2 

4 

.9843 

1.55 

15.25 

3 

18    10 

3 

4 

.3961 

.383 

17.14 

4 

20   12 

4 

2 

.3024 

.0663 

6.727 

5 

25    15 

5 

4 

5x10-3 

7.8x10-3 

3.545 

6 

30   18 

6 

4 

IxlO"-^ 

2.6x10"^ 

1.3x10-3 

7 

33   21 

7 

4 

-9 
1x10 

7.3x10-20 

l.lxlO"^ 

8 

34   22 

8 

2 

0 

0 

0 

TABLE  4.1  (continued) 
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Cb) 


— Counters — 

No .   No . 
k  FUN  GRAB 

ORDER 

II  x^-x*  II 

f(xS-f(x*) 

0,       Oi 

1j   Oi     ■" 

0    1    1 

- 

2.2 

24.2 

215.6 

1    8    6 

4 

1.314 

2.092 

12.1 

2    15   11 

4 

.9845 

1.549 

15.23 

3   24   16 

4 

.4091 

.36 

16.29 

4   28   20 

2 

.3152 

.06698 

6.504 

5   36   25 

4 

.0102 

7.3x10"-^ 

3.411 

6   43   30 

4 

4x10"^ 

4.2x10"^ 

3.6x10"'^ 

7   47   34 

3 

-9 
4x10 

3xl0-^« 

3.9x10"^ 

(c) 


Counters 
No. 
k  FUN 

ORDER 

II  k  *  II 

X  -X 

f(x^-f(/) 

II  f'(x^)  II 

0    3 

- 

2.2 

24.2 

215.6 

1   17 

4 

1.313 

2.095 

12.16 

2   31 

4 

.9804 

1.565 

15.54 

3   45 

4 

.1932 

.5985 

26.27 

4   59 

4 

.1219 

1.3x10"^ 

3,591 

5   73 

4 

.02 

5.2x10"^ 

.4573 

6   88 

4 

1x10"-^ 

3.6x10"^ 

.0155 

7   102 

4 

7x10"^ 

-14 
9.6x10 

1.6x10"^ 

8   108 

2 

-9 
2x10 

-19 
4.8x10 

5.5x10-^° 
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Figure  4.4     Trajectory  of  VO  algorithm   for   the  minimization  of 
Rosenbrock's   function    (4.36),   ;^°  =    (-1.2,    1)^,   x^   = 
(1.002,    .9951)T.      Minimum  is  x8  =  x*  =    (1,    1)T. 
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1^ 
each  X  as  a  function  of  p  is  also  shown.   Observe  that  each  traiec- 

tory  tends  to  follow  the  function  contours. 


4.6.2  Powell's  Problem 


The  problem 


2  2  4  4 

minimize  f  (x)  =  (x,  +  lOx-)  +5(x„-x. )  +  (x„  -  2x„)  +10(x,  -x.)  , 

n,  1  Z.  j't        /J  14 

X 

^  (4.37) 


was  originally  proposed  by  Powell  [4l].   The  initial  values  are 


x°  =  (3,  -I,  0,  1)''^,   and   f(x°)  =  215 


and  the  minimum  values  are 


X  =  (0,  0,  0,  0)^,   and   f(x  )  =  0 


This  function  has  a  singluar  (positive  semidefinite)  hessian  at  x  =  x  . 
Thus  from  the  convergence  analysis  of  Section  3.6.2,  the  VO  algorithm 
converges  to  x  linearly  (order  equal  to  one).   The  linear  convergence 
is  indeed  verified  by  the  results  summarized  in  Table  4.2,   In  compari- 
son with  the  results  for  Rosenbrock's  problem  in  Table  4.1,  the  dif- 
ferent rates  of  convergence  can  be  clearly  discerned.   However,  while 

-4       * 
the  point  of  convergence  is  within  10   from  x  ,  which  is  not  as  close 

as  the  convergence  point  for  Rosenbrock's  problem,  the  final  function 
and  gradient  values  are  quite  close  to  their  minimum  values.   The  com- 
puter times  reported  in  Table  4.2  illustrate  the  difficulty  in  timing 
small  intervals  of  computer  time.   While  the  VO  algorithm  required  the 
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TABLE  4.2  Results  for  Powell's  problem  with  (a)  the  function,  the 
gradient,  and  the  hessian  values  supplied  (MAXAV  =  3), 
(b)  the  function  and  the  gradient  values  supplied  (MAXAV 
=  2),  and  (c)  only  the  function  values  supplied  (MAXAV  =1) 
The  computer  times  required  were,  .033  seconds  for  (a), 
.027  seconds  for  (b) ,  and  .04  seconds  for  (c) . 

(a) 


No. 
HESS 

ORDER 

II  -''-x*  IL 

f(x^-f(x*) 

ll£'<»^IL 

k 

No.   No. 
FUN  GRAB 

0 

1     1 

0 

- 

3 

215 

310 

1 

6    4 

1 

4 

.1703 

2.3x10"^ 

2.547 

2 

14    7 

2 

4 

.0169 

4.1x10"^ 

1.2x10"'^ 

3 

20   10 

3 

4 

2xl0~^ 

3x10-^1 

9.1x10"*^ 

4 

26   13 

4 

3 

IxlO"^ 

2.2xlO~-^^ 

7.3x10"^-^ 

(b) 


— Count 

No. 
k  FUN 

ers — 
No. 
GRAB 

ORDER 

II  x^-x*  II 
I'  'V   a.  "oo 

f(x'^)-f(x*) 

l|f'(xS  II 

"    '\j     ^'V,  '  "oo 

1    10 

8 

4 

.1499 

2.8x10"^ 

2.356 

2   22 

15 

4 

.0171 

4.3x10"^ 

3.5x10"^ 

3   32 

22 

4 

2x10"^ 

3.1x10"^^ 

9.6x10"^ 

4   42 

29 

4 

2x10"'^ 

2.3x10"^^ 

7.6x10"^^ 

(c) 


Counters 
No. 
k  FUN 

ORDER 

II  ^     *  II 

X  -X 

'1  ^V,   '\j       "«> 

f(x^-f(x*) 

"r^iL 

1    35 

4 

.1671 

2.4x10"^ 

2.655 

2   65 

4 

.0173 

3.7x10"^ 

1.7x10"^ 

3   93 

4 

IxlO"^ 

2.4x10"^^ 

4.2xl0~'^ 

4   121 

4 

5x10"'^ 

2.4x10"^^  ■ 

4.3x10"^ 
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same  number  of  iterations  whether  MAXAV  was  equal  to  3  or  2,  the  com- 
puter time  was  measured  to  be  less  when  MAXAV  =  2.   This  relationship 
in  the  computer  times  was  not  expected.   The  reason  for  the  discrepancy 
is  probably  due  to  experimentally  observed  random  errors  in  timing 
small  intervals  of  time  in  the  IBM/370  Mod  165  computer  with  the  system 
software  package  used  [24]. 

4.6.3  Fletcher  and  Powell's  Problem 
The  problem 


minimize   f(x)  =  100[(x„  -  109)^  +  (R  -  1)^]  +  xj   ,    (4.38) 
"^3  3 


where 


-Tr/2  <  2-nB   =  tan~'^(x  /x  )  <  3Tr/2   , 


D   ^  2  ^  2,1/2 
R  =  (x^  +  x^) 


was  proposed  by  Fletcher  and  Powell  [22].   The  function  has  a  steep 
helical  valley.   The  initial  values  are 


x°  =  (-1,  0,  0)^,    and   f(x°)  =  2500 


and  the  minimum  values  are 


X  =  (1,  0,  0)"^,   and   f(x'')  =  0 


This  function  is  not  defined  at  the  points  x  =  (0,  0,  x„)  ,  for 


any 
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value  of  x_.  Furthermore,  a  continuous  definition  is  not  possible. 
Thus  the  sufficient  conditions  for  the  global  convergence  of  the  VO 
algorithm  are  not  met.   However  as  the  results  summarized  in  Table  4.3 
show,  the  algorithm  converges.   However,  observe  that  x  moves  further 
away  from  x  initially.   Apparently  this  behavior  is  due  to  the  pre- 
viously mentioned  discontinuities  and  the  fact  that  the  hessian  has 
one  large  negative  eigenvalue  over  the  first  four  iterations.   This 
large  negative  eigenvalue  is  reflected  in  a  large  diagonal  element  of 
the  D  matrix,  computed  during  the  factorization  described  in  Section 
3.5,  for  the  first  four  iterations. 


4.6.4  Wood's  Problem 
The  problem 

minimize  f(x)  =  100(x^-x?)^+  (1  -  x,  )^  +  gO(x,  -  xl)^  +  (1-xJ^ 
'b        2   1         1        4   3         3 

X 

a. 

+  10.1[(X2-  l)^+(x^-  1)^]+19.8(X2-  l)(x^-l)  , 

(4.39) 
was  proposed  by  C.F.  Wood  in  a  study  reported  by  Colville  [lO].   The 
initial  values  are 


x°  =  (-3,  -1,  -3,  -1)^,   and   f(x°)  =  19192 


and   the  minimum  values  are 


x*  =   (1,    1,    1,    l)"^,        and        f(x   )   =  0 


This  function  has  a  saddle  point  at  approximately  the  point  x  =  x  , 
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TABLE  4.3  Results  for  Fletcher  and  Powell's  helical  valley  problem. 
Part  (a)  is  for  the  function,  the  gradient  and  the  hessian 
supplied  (MAXAV  =  3),  (b)  is  for  the  function  and  the  gra- 
dient supplied  (MAXAV  =  2) ,  and  (c)  is  for  only  the  function 
values  supplied  (MAXAV  =  3) .   The  computer  times  required 
were,  .067  seconds  for  (a),  .073  seconds  for  (b),  and  .097 
seconds  for  (c) . 

(a) 


k 

Counters 

No.   No. 
FUN  GRAD 

No. 
HESS 

ORDER 

II  k  *  II 

f(x^-f(x*) 

'V.       % 

IIC'f^^L 

0 

1    1 

0 

- 

2 

2500 

1591.5 

1 

6    4 

1 

4 

4.45 

32.78 

122.6 

2 

8    6 

2 

2 

4.04 

27.335 

101.1 

3 

13    9 

3 

4 

3.44 

15.45 

44.86 

9 

46   26 

9 

4 

5x10"^ 

-1  3 
4.8x10  ^^ 

9.4x10"^ 

10 

47   27 

10 

2 

-13 
9x10 

1.2x10"^'^ 

IxlO-^^ 

(b) 


— Counters — 

No.   No. 
k  FUN  GRAD 

ORDER 

II  z'-{  IL 

f(x^-f(x*) 

llf'U^  II 

1    9    7 

4 

4.45 

32.78 

122.6 

2   14   12 

2 

4.04 

27.33 

101.1 

10   75   57 

4 

5x10"^ 

3x10-11 

5.9x10"^ 

11   80   62 

3 

7x10-12 

-23 
4.9x10 

2.2xl0"ll 

(c) 


Counters 

No. 

k  FUN 

ORDER 

II  x^-x*  II 

f(x'')-f(x*) 

Wl'^l^L 

1   23 

2 

4.95 

24.75 

15.95 

2   49 

4 

4.1 

19.19 

33.34 

10  208 

4 

2x10"^ 

2.6xl0~l^ 

5x10"^ 

11   218 

2 

2x10-12 

-22 
1.7x10 

2.6x10-1° 
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where  x  is  given  by 


-.9679740249375927 
.9471391408178411 

-.9695163103315915 
.9512476657923259 


The  magnitude  of  the  gradient  at  x^  satisfies  11  f'(x^)  11   <  4  x  10    , 

Oi  "   '\,    X       "  CD  ' 

and  the  hessian  f"(x  )  is  indefinite.   The  function  value  is  f(x^)  = 

7.87696716518.   Table  4.4  summarizes  the  results  for  the  classical 

0  T 

starting  point  x  =  (-3,  -1,  -3,  -1)  .   The  algorithm  was  not  affected 

by  the  saddle  point  for  this  initial  point.   However,  for  the  initial 

point 


x°  =  (-.9670,  .9481,  -.9685,  .9522)''^ 

Oit 


which  is  very  close  to  the  saddle  point  x  ,  the  VO  algorithm  requires 

A 

substantially  more  iterations  to  converge  to  x  as  the  results  summar- 

ized  in  Table  4.5  show.   Observe  that  initially,  the  algorithm  tends 

to  be  attracted  to  the  saddle  point,  but  that  it  extricates  itself 

from  this  neighborhood  to  eventually  converge  to  x  .   The  program 

described  in  Appendix  I  also  includes,  as  options,  the  steepest  descent 

algorithm  [34],  the  conjugate  gradients  algorithm  of  Fletcher  and 

Reeves  [23],  and  the  quasi-Newton  algorithm  of  Fletcher  and  Powell 

[22].   From  the  initial  guess  x  ,  in  the  neighborhood  of  the  saddle 

lit 

point,  all  of  these  algorithms,  as  implemented  by  this  program,  con- 
verged to  the  saddle  point.  It  should  be  noted  that  the  implementa- 
tions of  these  algorithms  in  this  program  converge  for  other  classical 
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TABLE  4.4  Results  for  Wood's  problem.   Part  (a)  is  for  the  function, 
the  gradient  and  the  hessian  supplied  (MAXAV  =  3) ,  (b)  is 
for  the  function  and  the  gradient  supplied  (MAXAV  =  2) ,  and 
(c)  is  for  only  the  function  values  supplied  (MAXAV  =  1). 
The  computer  times  required  were  .04  seconds  for  (a),  .04 
seconds  for  (b) ,  and  ,047  seconds  for  (c) . 


(a) 


k 

Counters 

No.   No. 
FUN  GRAD 

No. 
HESS 

ORDER 

II  k  *  II 

f(xS-f(x*) 

II£'<^'>1L 

0 

1    1 

0 

- 

4 

19192 

12008 

1 

6    4 

1 

4 

3.84 

3100.4 

1343 

2 

15    7 

2 

4 

.259 

.112 

2.81 

3 

19    9 

3 

2 

.03 

.055 

6.9 

4 

24   12 

4 

4 

2x10"-^ 

2.8xlO~^ 

4.6x10--^ 

5 

27   15 

5 

4 

4x10"^ 

3xlO-l« 

5x10"^ 

6 

28   16 

6 

2 

10-15 

2x10-32 

5x10-15 

(b) 


— Counters — 

No.   No. 
k   FUN  GRAD 

ORDER 

II  k  A  II 

f(x^)-f(x*) 

iir<$'>iL 

1    10    8 

5  47   35 

6  52   40 

4 
4 
2 

3.83 

-9 
5x10 

10-15 

3099.7 

1.8x10-1^ 

-27 
1.8x10 

1343 
4x10-^ 
lxlO-13 

(c) 


Counters 
No. 
k  FUN 

ORDER 

II  k  *  II 
II  ^  -?   Ilea 

f(x^-f(/) 

llf'Cx'')  II 

1   37 

4  117 

5  142 

4 
4 
4 

3.84 
lxlO"3 
1x10"^ 

3097.8 

8x10"^ 

-13 
9x10 

1344 

2x10"^ 
4.7x10"^ 
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TABLE  4.5  Results  for  Wood's  problem  with  initial  point  in  the 

neighborhood  of  its  saddle  point.  These  results  are  for 
MAXAV  =  2  (the  function  and  the  gradient  supplied).  The 
behavior  was  similar  for  other  MAXAV  settings. 


k 

-Count 

No. 
FUN 

ers 

No. 
GRAD 

ORDER 

II  k  *  II 

f(x^-f(/) 

II  f '(A  II 

0 

1 

1 

- 

1.97 

7.878 

I.l 

1 

12 

8 

4 

1.98 

7.877 

.072 

2 

21 

15 

4 

2.03 

7.875 

.82 

3 

29 

21 

2 

2.1 

7.873 

2.4 

4 

39 

28 

3 

2.18 

7.85 

2.89 

5 

48 

35 

3 

2.23 

7.80 

.68 

6 

54 

41 

2 

2.31 

7.75 

4.77 

7 

65 

49 

4 

2.33 

7.61 

5.02 

8 

74 

55 

4 

2.35 

7.31 

5.34 

9 

82 

62 

4 

2.36 

6.85 

3.16 

10 

88 

68 

2 

2.34 

6.51 

11.8 

11 

99 

76 

4 

2.21 

5.45 

12.5 

12 

106 

81 

2 

2.1 

5.01 

22.1 

13 

115 

88 

4 

1.81 

4.51 

39.5 

22 

188 

147 

4 

.03 

1.6x10"^ 

.93 

23 

199 

154 

4 

-4 
4x10 

2x10"'' 

7x10"-^ 

24 

206 

161 

4 

10- 

-2? 
1.3x10 

2x10-10 
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problems   in  the  same  manner  as   reported    in   the  literature.      Thus   it 
appears   that  the  convergence   to   the  saddle  point   of   these  algorithms 
is  not   due   to   this  particular   implementation. 


4.6.5     Cragg  and  Levy's  Problem 
The  problem 


minimize     f (x)  =  (e      -x„)    +100(x--x-)    +  tan   (x„-x,) 
"v.  I  2        3  3        4 


+  xj+(x^-l)^        ,  (4.40) 


was  proposed  by  Cragg  and  Levy    [ll].      The  initial  values  are 


x°   =   (1,    2,    2,    2)^,        and        f(x°)    =  2.26618 
a.  '\j 


and  the  minimum  values  are 


X*  =  (0,  1,  1,  l)"^,    and   f(x*)  =  0 


■k 

This  function  has  a  singular  (positive  semidefinite)  hessian  at  x  =  x  . 

^       '\, 

The  results  obtained  by  using  the  VO  algorithm  are  summarized  in  Table 
4.6.  Observe  that  the  gradient  and  the  hessian  of  this  function  have 
terms  involving  exponentials,  trigonometric  functions  and  terms  raised 
to  large  powers.  These  terms  cause  the  approximations  to  the  hessian 
and  the  gradient  to  be  less  accurate  than  for  the  previous  functions. 
This  Inaccuracy  is  manifested  in  the  results,  since  they  are  somewhat 
sensitive  to  whether  the  gradient  and  the  hessian  are  supplied.   Thus 
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TABLE  4.6  Results  for  Cragg  and  Levy's  problem  with  (a)  the  function, 
the  gradient,  and  the  hessian  values  supplied  (MAXAV  =  3), 
(b)  the  function  and  the  gradient  values  supplied  (MA.XAV=2), 
and  (c)  only  the  function  values  supplied  (MAXAV  =  I) ,   The 
computer  times  required  were,  .047  seconds  for  (a),  .037 
seconds  for  (b) ,  and  .067  seconds  for  (c). 


(a) 


No. 
HESS 

ORDER 

II  k  *  II 

"  %  r^J       Hot 

f(x^-f(x*) 

^             a. 

k 

No.   No. 
FUN  GRAD 

0 

1    1 

0 

- 

1 

2.27 

12.03 

1 

4    3 

1 

2 

1 

1.6 

8.7 

2 

9    6 

2 

4 

.51 

.481 

3.1 

3 

14    9 

3 

3 

.53 

.106 

.71 

6 

31   18 

6 

4 

.012 

2x10"^ 

7x10"^ 

7 

38   21 

7 

4 

5x10""^ 

-12 
1.5x10 

4.8x10"^ 

(b) 


— Counters — 

No.   No. 
k   FUN  GRAD 

ORDER 

II  ^     *  II 

f(x^)-f(x*) 

iU'f^'>ii» 

1     8    7 

2 

1 

1.62 

8.7 

2    17   14 

4 

.445 

.147 

.76 

3   27   21 

4 

.0955 

3.2x10"'^ 

-4 
7x10  ^ 

4   38   28 

4 

.0324 

5.5x10"^ 

IxlO"^ 

5   48   35 

4 

6x10"^ 

-12 
9x10  ^ 

-9 
8x10 

(c) 


Counters 

No. 
k  FUN 


ORDER 


k  * 

X  -X 

Oi   'V. 


f(xS-f(x*) 


f'CxS  II 


1   27 

6  178 

7  204 


8x10 


-3 


1.62 
-11 


6x10 


-3 


5x10 


9x10 


■12 


8.7 
4xlO~^ 


8x10 


-9 
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it  is  conjectured  that  it  was  a  matter  of  chance  that  the  VO  algorithm 
required  fewer  iterations  when  the  function  and  the  gradient  values  were 
supplied  (MAXAV  =  2)  than  when  the  function,  the  gradient  and  the 
hessian  values  were  supplied  (MAXAV  =  3) . 

4.6.6  Comparisons  with  Seven  Minimization  Algorithms 

Published  results  will  be  used  to  compare  seven  popular  algorithms 
with  the  VO  algorithm.   The  existing  algorithms  which  will  be  used  in 
the  comparisons  are  the  following: 

FR  —  Fletcher  and  Reeves  [23]  conjugate  gradients  (function  and 
gradient  values  are  supplied) . 

DFP  -  Davidon  [12],  Fletcher  and  Powell  [22]  quasi-Newton  with 

rank,  two  updates  (function  and  gradient  values  are  supplied). 

B  —  Broyden  [7]  quasi-Newton  with  rank  one  updates  (function  and 
gradient  values  are  supplied) . 

F  —  Fletcher  [21]  quasi-Newton  with  a  combination  of  rank  one 

and  rank  two  updates  (function  and  gradient  values  are  sup- 
plied) . 

P  —  Powell  [41]  conjugate  directions  (function  values  are  sup- 
plied) . 

S  —  Stewart  [49]  DFP  algorithm  with  gradients  approximated  by  a 
difference  scheme  (function  values  are  supplied). 

C  —  Galium  [12]  B  algorithm  with  the  gradient  approximated  by  a 
difference  scheme  (function  values  are  supplied). 

Almost  all  the  results  for  the  above  algorithms  on  the  preceding  five 
test  problems  (Sections  4.6.1  through  4.6.5)  are  obtained  from  a 
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comparative  study  published  by  Himmelblau  [3l].   The  exceptions  are 
the  following.  The  C  algorithm  was  not  compared  in  [31],  and  thus 
Cullum's  results  are  used  for  three  of  the  five  problems;  the  results 
for  Rosenbrock's  problem  were  not  tabulated  in  [31],  and  thus  the 
results  quoted  in  the  original  publication  of  the  algorithm  are  used, 
or  the  results  published  by  Sargent  and  Sebastian  [46]  are  used,  which- 
ever were  most  favorable  to  the  algorithms. 

In  order  to  give  a  fair  comparison,  all  algorithms  should  be  com- 
pared on  the  basis  that  a  similar  accuracy  of  the  final  point  of  con- 
vergence was  obtained.   It  was  estimated  that  the  results  given  for 
the  existing  algorithms  were  for  a  termination  with  maximum  norm  of 
the  gradient  less  than  lO"^.   Thus  Table  4.7  summarizes  the  results  of 
the  VO  algorithm  for  the  five  test  problems  with  this  termination 
(STPEPS  set  to  1  X  lO"'^).   The  final  counters  for  the  case  when  the 
hessian,  the  gradient  and  the  function  values  are  all  supplied,  is 
included  in  the  table  although  this  option  is  not  compared  with  any 

algorithm. 

Table  4.8  gives  the  results  of  the  comparisons.   The  table  shows 
the  number  of  iterations,  the  number  of  function  evaluations  and  the 
number  of  gradient  evaluations  (for  the  algorithms  that  use  supplied 
gradients)  for  each  algorithm  including  the  VO  algorithm.   With  one 
exception,  the  proposed  algorithm  required  substantially  fewer  itera- 
tions to  achieve  convergence.   The  exception  was  for  the  helical  valley 
problem  of  Section  4.6.3.   In  fact,  for  the  helical  valley  problem,  the 
VO  algorithm  was  not  as  efficient  in  general  as  many  of  the  other  algo- 
rithms, perhaps  due  to  the  discontinuities  discussed  in  Section  4.6.3. 
Himmelblau  [3l]  rated  the  F  algorithm  as  the  best  of  those  compared  in 
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TABLE  A. 7  Summary  of  results  for  the  VO  algorithm  with  STPEPS  set  to 
10"^.   The  setting  of  MAXAV  indicates  that  the  function, 
the  gradient  and  the  hessian  values  are  supplied,  if  equal 
to  3;  the  function  and  the  gradient  values  are  supplied,  if 
equal  to  2;  and  only  the  function  values  are  supplied  if 
equal  to  1.   The  point  x  is  the  convergence  point. 


Problem 

in 
Section 

MAXAV 

nters- 
No. 
GRAD 

No. 
HESS 

11-  *  II 

f(x)-f(x*) 

Of     % 

II  rw  IL 

No. 
ITN 

No. 
FUN 

3 

7 

32 

20 

7 

5x10"^ 

7x10-1^ 

7x10"^ 

4.6.1 

2 

7 

46 

33 

1x10"'^ 

2x10-1^ 

2x10"^ 

1 

7 

94 

6x10"^ 

2x10-^^ 

9x10"^ 

3 

3 

15 

8 

3 

IxlO"^ 

8x10"^ 

3x10"^ 

4.6.2 

2 

3 

27 

20 

1x10"^ 

8x10"^ 

3x10"^ 

1 

3 

80 

IxlO"^ 

7x10"^ 

3x10"^ 

3 

9 

46 

26 

9 

5x10"^ 

5x10-13 

9x10"^ 

4.6.3 

2 

10 

75 

57 

5x10"^ 

3x10-11 

6x10-^ 

1 

10 

202 

1x10"^ 

2x10-12 

2x10"^ 

3 

5 

26 

14 

5 

IxlO"^ 

2x10-1^ 

1x10-^ 

4.6.4 

2 

5 

46 

34 

IxlO"'' 

lxlO-1^ 

1x10"^ 

1 

5 

132 

6xlO~^ 

lxlO-11 

9x10"^ 

3 

6 

26 

16 

6 

5x10"^ 

2x10-^ 

3x10"^ 

4.6.5 

2 

4 

38 

28 

3x10"^ 

5x10-^ 

IxlO"^ 

1 

4 

HI 

5x10"^ 

6x10"'' 

9x10"^ 
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TABLE  4.8  Results  of  comparisons  of  the  VO  algorithm  with  seven  other 
existing  algorithms  for  the  minimization  of  five  standard 
test  problems. 


Problem 

in 
Section 

Counter 

Algorithms  with  supplied 
functions  and  gradients 
FR   DFP   B    F  VO 

Algorithms  with  only 
functions  supplied 
P     S    C    VO 

No.  ITN 

27 

19 

35 

39   7 

37 

23   25    7 

4.6.1 

No.  FUN 

155 

96 

51 

47  46 

153 

152   145   94 

No.  GRAD 

28 

20 

36 

47   33 

No.  ITN 

104 

36 

38 

60   3 

25 

41   18    3 

4.6.2 

No,  FUN 

624 

434 

374 

68  27 

966 

622   148   80 

No.  GRAD 

105 

37 

39 

68   20 

No.  ITN 

36 

20 

21 

35   10 

4 

21   26   10 

4.6.3 

No.  FUN 

202 

141 

140 

42   75 

48 

191   177   202 

No.  GRAD 

37 

21 

22 

42   57 

No.  ITN 

189 

57 

42 

60   5 

25 

38         5 

4.6.4 

No.  FUN 

3288 

475 

310 

61   46 

276 

715       132 

No.  GRAD 

190 

58 

43 

61   34 

No.  ITN 

39 

96 

84 

82   4 

36 

128        4 

4.6.5 

No.  FUN 

221 

424 

350 

91   38 

3480 

1662       111 

No.  GRAD 

40 

97 

85 

91   28 
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the  study.  Clearly,  the  VO  algorithm  was  more  efficient  than  the  F 
algorithm  in  terms  of  the  number  of  iterations,  and  in  terms  of  the 
number  of  function  and  gradient  evaluations  required  in  four  of  the 
five  problems  tested. 

4.6.7  Example  with  Nonlinear  Constraints 
The  test  problem  is  given  by 


minimize     f(x)  =  (x,  -  1)^   +  (x„  -  i)'^   ,  (4.41) 

X 


subject  to   u,(x)  =  X  +  X.  -  2  <  0   , 


2 

u  (x)  =  X  -  x„  <  0 

I  'x,  1     L  — 


The  solution  is  at  x  =  (1,  1),  f(x  )  =  1,  and  u, (x  )  =  u„(x  )  =  0. 
a.  '\i  1  Oi      Z  Oi 

Therefore,  the  solution  is  at  the  boundary  of  the  feasible  region. 
The  problem  will  be  approximated  with  both  the  cubic  penalty  function 
method  given  by 

3  3 

minimize  f(x)  =  f(x)  +  M(max[0,u, (x) ])   +  y(max[0,u„ (x) ])   , 

0,  O/  1   'V  2  0; 

^  (4.42) 

and  the  quadratic  penalty  function  method  given  by 

2  2 

minimize  f(x)  =  f(x)  +  y(max[0,u, (x) ])  +  y(max[0,u„(x) ]) 

'^  (4.43) 

The  initial  point  is 
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0  T 

X  =  (-1.975,  3.9) 


Table  4.9  sunmiarizes  the  results  for  two  values  of  y  and  for  STPEPS  = 
10  .   Note  that  in  general,  the  cubic  penalty  function  (4.42)  requires 
more  effort  to  solve  than  the  quadratic  one  (4.43).   Also  for  equal 
values  of  \i,    the  solution  computed  violates  the  constraints  more  with 
the  cubic  penalty  than  with  the  quadratic  one.   It  is  conjectured  that 

both  of  these  characteristics  will  always  be  present  for  any  other 

4 
problem.   Observe  that  with  the  hessian  available,  for  \i  =   10    ,   and 

with  the  quadratic  penalty  function,  the  algorithm  could  not  converge 
to  the  desired  accuracy  (STPEPS  was  10   ) .   It  appears  that  the  dis- 
continuities of  the  hessian  for  the  quadratic  penalty  function  affected 
the  algorithm  in  this  case. 

The  difference  between  the  two  types  of  penalty  functions  mani- 
fested itself  even  more  when  problems  (4.42)  and  (4.43)  were  solved  in 
a  sequence  of  unconstrained  minimizations  for  increasingly  larger  values 
of  \i.      V7e  started  with  y  =  10,  and  used  the  VO  algorithm  to  obtain  a 
rough  approximation  to  the  solution  of  problems  (4.42)  and  (4.43). 
Taking  this  approximate  solution  as  the  initial  point,  a  rough  approxi- 
mation to  the  solution  of  problems  (4.42)  and  (4.43)  was  similarly 

2 
computed,  but  this  time  for  y  =  10  .   The  procedure  was  repeated  for 

3  4 

y  =  10  ,  and  for  y  =  10  .   However,  the  desired  accuracy  of  the  approx- 

4  -5 

imate  solution  was  tightened  for  y  =  10   (STPEPS  =10  ).   The  results 

are  summarized  in  Table  4.10.   The  accuracy  of  the  final  solution  was 

4 
equivalent  to  the  results  for  y  =  10  in  Table  4.9.   The  results  of 

Table  4.10,  when  compared  to  the  results  of  Table  4.9,  indicate  that 

problem  (4.43)  with  the  quadratic  penalty  was  much  more  efficiently 
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TABLE  4.9  Summary  of  results  for  the  constrained  problem  (4.41)  with 

two  different  penalty  functions,  and  two  different  constants 
]i   multiplying  the  penalty  terms.   (a)  The  function,  the 
gradient  and  the  hessian  are  supplied  (MAXAV  =  3) .   (b)  The 
function  and  the  gradient  are  supplied  (MAXAV  =  2).   (c) 
Only  the  function  is  supplied  (MAXAV  =  1). 


(a) 


M 

Penalty 
function 

No. 
ITN 

— Counters- 
No.   No. 
FUN  GRAD 

No. 
HESS 

II  f'w  II 

"1^^) 

u^Ck) 

10 

Quadratic 

6 

23    14 

6 

5x10"^ 

.03 

.03 

Cubic 

7 

28   17 

7 

8x10"^^ 

.14 

.13 

10* 

Quadratic 

15 

128   26 

9 

.22 

3x10"^ 

3x10"^ 

Cubic 

19 

118   36 

14 

8x10"^^ 

5x10"-^ 

5x10"-' 

(b) 


\i 

Penalty 
function 

Counters 

No .   No .   No . 
ITN   FUN   GRAD 

IU'(^>II. 

-i^V 

"2^^) 

10 

Quadratic 

6   39   28 

2x10-1^ 

.03 

.03 

Cubic 

7   43   31 

IxlO"^ 

.14 

.14 

10* 

Quadratic 

16   126   59 

3x10-1^ 

3xl0~^ 

3xl0~^ 

Cubic 

14   105   51 

1x10-1^ 

5x10"^ 

5x10"-^ 

TABLE  4.9  (continued) 
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(c) 


u 

Penalty 
function 

Counters 
No .   No . 
ITN   FUN 

iir^^HL 

"iC^) 

u^Cx) 

10 

Quadratic 

6    75 

6x10"^^ 

.03 

.03 

Cubic 

6    77 

2x10"^ 

.14 

.13 

10^ 

Quadratic 

15   200 

IxlO"^ 

3x10"^ 

3x10"^ 

Cubic 

25   301 

1x10"^ 

5x10"^ 

5x10"-^ 
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TABLE  4.10  Summary  of  results  obtained  by  solving  problems  (4.42) 
and  (4.43)  by  a  sequence  of  minimizations  with  constant 
W   set  to  10,  102,  103,  and  10^.   If  MAXAV  is  1,  only 
function  falues  are  supplied.   If  MAXAV  is  2,  function 
and  gradient  values  are  supplied.   If  MAXAV  is  3,  function, 
gradient  and  hessian  values  are  supplied. 


Problem 

MAXAV  =  1 
No.  No. 
ITN  FUN 

MAXAV  = 
No.   No. 
ITN   FUl'J 

2 

No. 

GRAD 

No. 

ITN 

MAXAV  =  3 
No .   No . 
FUN   GRAD 

No. 
HESS 

Quadratic 
penalty 
(4.42) 

8   109 

8   59 

41 

8 

35   24 

8 

Cubic 
penalty 
(4.43) 

24  312 

25   174 

104 

25 

129   60 

22 
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solved  by  a  sequence  of  minimizations  than  by  one  minimization  with 
the  large  value  of  y;  the  opposite  was  true  for  the  cubic  penalty 
function.   It  was  also  evident  from  the  results  that  each  minimization 
with  the  cubic  penalty  function  was  very  difficult,  which  implies  that 
\i   might  have  to  be  increased  at  a  slower  rate  when  this  penalty  func- 
tion is  used. 

4.6.8  Example  with  Box  Constraints 

The  test  problem  proposed  by  Rosenbrock  [44],  given  by 


minimize   f(x)  =  100(x„  -  x^)^  +  (1  -  x,)^   ,         (4.44) 
Oi  2     1  1 

x 


was  solved  with  several  sets  of  box  constraints.   One  of  these  sets 
is  given  by 


-1.5  1  x^  1   1.5 


0.9  f  x-  j<  3 


Rosenbrock' s  problem  (4.44)  has  two  local  minima  with  this  set  of  box 
constraints.   With  the  initial  point 


0         T 
x""  =  (-1,  2)' 


the  VO  algorithm  converges  to 


x^  =  (-.94324,  0.9)"^,   f(x'')  =  3.7868 


and 
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f'(x^)  =  (-2.4  X  10~^,  2.0602)'^ 


requiring  7  iterations,  77  function,  14  gradient,  and  5  hessian  evalu- 
ations.  With  the  initial  point 


0  T 

x"  =  (0.5,  2)^ 


the  VO  algorithm  converges  to  the  unconstrained  minimum, 


4        T 
x^  =  (1,  1)' 


requiring  4  iterations,  27  function,  12  gradient,  and  4  hessian  evalu- 
ations.  The  set  of  box  constraints  given  by 


-0.02  ^  X  <  0. 


0.2554  1  x^  £  3 


illustrates  the  possibility  of  creating  what  in  effect  is  a  saddle 
point.   The  point 


x°  =  (-0.02,  0.2554)^ 


at  which  the  gradient  is 


f'(x°)  =  (0,  51)^   , 


is  a  saddle  point  since  the  gradient  indicates  that  any  direction  d 
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from  X  ,  such  that 


0  T 


f (x")   ^ 


is  satisfied,  points  towards  the  outside  of  the  box  constraints;  there- 
fore any  typical  gradient  method  such  as  steepest  descent  will  stop  at 
this  point.   This  phenomenon  has  been  called  jamming  in  the  literature 
[56],   However,  the  hessian  is  not  positive  semidef inite  for  all  pos- 
sible feasible  directions,  and  thus  the  point  is  a  saddle  point.   The 
minimum  is  computed  by  the  VO  algorithm  to  be 


x^  =  (0.8,  0.64)^ 


requiring  2  iterations,  23  function,  5  gradient,  and  2  hessian  evalua- 
tions. 

4.6.9  Example  with  Errors  in  the  Supplied  Function  and  Gradient 
Rosenbrock's  problem  given  by 


2  2  2 

minimize   f(x)  =  100(x„  -  x,)  +  (1  -  x,) 
li  z     1  1 


was  minimized  with  the  introduction  of  random  errors  in  the  function 
and  the  gradient  values  at  each  iteration.   The  errors  were  generated 
as  follows.   At  each  iteration,  the  actual  value  of  the  function  re- 
turned was 


fCx^  =  f{x^)   +   e.  +  S^lf(A 
-v,       Oi      fa    fr   ii 
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where  e^  and  t.     were  random  numbers  satisfying 
fa      fr  •'   ^ 


and 


fa  —  fa  —  fa 


f  r  —  fr  —  fr 


Similarly,    each  gradient   term  returned  was   given  by 


Cf'(x^].  =  [f'Cx'')].  +  E    +E    |[f(xS].|     , 


where  e   and  e   were  random  numbers  satisfying 
ga      gr 


and 


-E   <  e    <  £ 
ga  -  ga  -  ga 


-E     <  E     <  £ 

gr  -  gr  -  gr 


The  actual  implementation  is  given  in  Appendix  I.   The  VO  algorithm, 
was  then  used  to  solve  this  problem  with  three  sets  of  values  of  e^ 
(FABS),  E^   (FREL) ,  £    (GFABS),  and  £  ^  (GFREL) .   The  results  are 
summarized  in  Table  4.11.   Observe  that  the  overall  effectiveness  of 
the  algorithm  when  function  and  gradient  values  are  supplied  still 
remains  quite  good  in  spite  of  the  errors  present.   However,  when  only 
function  values  are  supplied,  the  VO  algorithm  is  not  able  to  approxi- 
mate the  solution  accurately  except  when  the  error  In  the  function 
values  is  relatively  small. 
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A.6.10  Simple  Circuit  Optimization  Example 

This  example  is  a  simple  circuit  optimization  which  was  run  in 
the  general  circuit  optimization  program  AOP  [28],   This  example  was 
chosen  to  illustrate  the  effect  of  rather  large  and  difficult  to  esti- 
mate errors  that  might  be  present  in  the  function  and  the  gradient 
evaluation.   The  circuit  is  shown  in  Fig.  4.5.   In  effect  two  inde- 
pendent sub-circuits  are  present  in  the  circuit.   The  designable 
parameters  or  independent  variables  are  R^  C^,    Cy   and  R^.   The 
function  to  be  minimized  is  given  by 


5  7  2 

f  =  /   [(V^  -  .2  tr   +  (V   -  .75)  ]  dt 
0    ^2  ^4 


The  rise  time  of  the  time  dependent  voltage  source  of  the  circuit  was 
extremely  small  (lO"^  sec);  this  small  rise  time  yields  a  rather  large 
error  in  the  first  time  step  of  the  numerical  integration  used  in 
evaluating  the  function  value.   It  was  estimated  that  the  maximum 
overall  error  in  the  function  and  the  gradient  values  computed  was 
5  X  10~^  in  absolute  value,  and  5  x  10~  in  relative  value.   Actually, 
the  maximum  error  is  much  larger  at  the  initial  point  given  by 


(R^,  C^,    C3,  R^)°  =  (I.  1.5,  1.5,  1) 


than  at  points  in  the  neighborhood  of  the  solution  which  is  approxi- 
mately given  by 


(R^,  C^,  C3,  R^)  -^   (1.34,  2.29,  3,  3) 


-149- 


Rl 

j — VWn 

V.  T     C2 
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C3 


If. 


R4 


Figure  4.5  Simple  circuit  to  test  and  compare  the  VO 
algorithm. 
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The  minimization  algorithm  in  the  AOP  program  is  the  one  proposed  by 
Cullura  which  is  a  quasi-Newton  algorithm  with  a  rank  one  update  and 
with  cyclic  scalar  searches  at  some  iterations  [l2].   The  AOP  program 
with  Cullum's  algorithm  converged  to  the  point 


(R^,  C^,  C3,  R^)^^  =  (1.43,  2.14,  2.89,  3.14) 


after  47  iterations,  requiring  160  function  evaluations,  and  50  gradient 
evaluations.   The  gradient  at  the  point  of  convergence  was 


f  =  (1.14  X  10"-^,  7.6  X  lO"'^,  -934.9,  8.3  x  lO"^)'^ 


The  AOP  program  using  the  VO  algorithm  with  the  proposed  modifications 
converged  to 


(R^,  C^,  C^,  R^)^  =  (1.341,  2.291,  3.003,  2.999) 


in  5  iterations,  requiring  46  function  evaluations,  and  37  gradient 
evaluations.   The  gradient  at  the  point  of  convergence  was 


f  =  (-1.8  X  10~^,  -1.1  X  10  ^,  112.1,  2.9  X  10"^)^ 


The  computer  time  required  for  the  original  run  was  54.22  seconds, 
while  the  VO  algorithm  run  required  20.22  seconds.   Observe  that  not 
only  was  the  computer  time  required  by  the  VO  algorithm  smaller,  but 
the  convergence  point  obtained  was  a  better  approximation  of  the  solu- 
tion than  for  the  algorithm  in  AOP.   The  VO  algorithm  was  somewhat 
sensitive  to  the  estimate  of  the  error  present  in  the  gradient  and  the 
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function.  For  example,  since  it  is  not  known  exactly  how  much  error 
is  present  in  the  gradient  and  the  function  values,  several  runs  were 
made  with  different  settings  of  the  variables  FABS,  FREL,  GFABS,  and 
GFREL;  the  results,  while  generally  converging  to  the  same  point, 
required  as  little  as  4  iterations  and  as  high  as  8  iterations,  but 
the  computer  times  required  were  always  less  than  30  seconds . 

4.6.11  MOSFET  Nand  Gate  Circuit  Optimization  Example 

Consider  the  two  input  MOSFET  nand  gate  shown  in  Fig.  4.6.   This 
gate  consists  of  one  small  load  device  Tl  and  two  medium  size  devices 
T2  and  T3.   The  design  objective  of  this  circuit  is  to  minimize  the 
area  of  the  devices  and  the  power  dissipation.   The  loose  constraints 
are  that  the  propagation  delay  should  not  exceed  about  110  nsec,  the 
output  voltage  should  not  be  less  than  about  -.7  volts,  and  box  con- 
straints on  the  designable  parameters  to  insure  a  circuit  which  can  be 
manufactured. 

The  three  designable  parameters  are  the  width  w  ,  and  the  length 
Si.,    of  the  load  device  Tl,  and  the  equal  width  w  of  each  of  the  other 
two  devices  T2  and  T3.   The  lengths  of  T2  and  T3  are  fixed  at   5.08  mils. 
The  elements  of  each  device  model  in  Fig,  4.6(b)  [5]  are  functions  of 
the  three  designable  parameters  as  follows.   The  drain  current  (units 
in  ma)  is  given  by 


Z'  2 

G  (w/£)(V^„  -  V)/2,    above  pinch-off, 


\s--l 


GS    T 


G^(w/Ji)Vj^g(Vpg  -  V^  -  Vj^g/2),  below  pinch-off. 
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Figure  4.6   (a)  Two-Input  nand  gate  optimized.   (b)  The  MOSFET  model 
used. 
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_3 

where  G  is  the  normalized  transconductance  (6  x  10   iranhos),  w  and  !L 
m  ' 

are  the  width  and  length  of  the  device  in  mils,  V   is  the  gate  to 

GS 

source  voltage  in  volts,  V   is  the  drain  to  source  voltage  in  volts, 
and  V  is  the  threshold  voltage  of  the  device  given  by 


1/2 
V  =  V   -  (V   +  P  )    /2 
T    FB    ^  SS    SI'^    ' 


where  V   is  the  flat  band  voltage  (1.62  volts),  V„^  is  the  source  to 
CD  SS 

substrate  voltage,  and  P   is  the  electrostatic  potential  (.5771  volts) 
The  capacitors  are  also  functions  of  the  width  and  length  of  each 
device  given  by  (units  in  pf.) 


CGS  =  3.87  X  10~'^  (£  +  1.27)  w 


COD  =  4.92125  x  10  ^  w 


r  -3 

'  022  +  4.33  X  10   w  ,  for  devices  T2  and  T3 


CSS  =  ■( 


1.968  X  10   w  ,  for  device  Tl 


We  must  now  express  the  design  objectives  in  terms  of  a  scalar 
performance  function,  the  function  to  be  minimized,  with  constraints. 
The  total  area  of  the  devices  is  given  by 


A  =  w^Z^  +   10.16  W2 


(4.45) 


and  the  power  dissipated  is  minimized  if 


°  =  (^DS>' 


(4.46) 
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is  minimized,  where  J   is  the  drain  current  of  any  of  the  devices. 

The  propagation  delay  constraint  normally  requires  a  transient  analysis, 

However,  it  can  be  shown  [24]  that 

T  =  76.666667  £  /w    ,  (4.47) 

is  a  good  approximation  to  the  propagation  delay,  which  allows  for  a 
computationally  less  expensive  optimization  run.   Thus,  this  constraint 
becomes 


Pj^  =  76.666667  £,/w  -  110  £  0    .  (4.48) 


The  output  voltage  constraint  may  be  expressed  by 


°v  =  -^^out"*"  -^^  ^°   •  ^'^-'^^^ 


Finally,  the  box  constraints  are  given  by 


5  £  w.  £  50   , 


5  £  £  1  50   , 


and 


50  1  "2  1  250 


Letting 

X  =  (w^,  Z^,   w^)^    ,  (4.50) 
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this  problem  may  be  expressed  by 


minimize   f(x)  =  A  +  10  D+10  (max[0,P^])  +  10  (max[0,0„])' 
X        '^  u  V 

(4.51a) 

L        H 
subject  to    X  £  X  j<  X    ,  (4.51b) 


L  T       H  T 

where  x  =  (5,  5,  50)  ,  and  x  =  (50,  50,  250)  .   The  constants  in  the 

function  (4.51a)  were  determined  elsewhere  [5]  to  give  a  reasonable 

solution  to  this  problem.  Observe  that  in  (4.51a)  for  m  =  2,  we  obtain 

the  quadratic  penalty  function  method,  and  for  m  =  3  the  cubic  penalty 

function  method.   Problem  (4.51)  was  solved  in  the  AGP  program  beginning 

with  the  initial  point 


x°  =  (10.2,  12.7,  228.6)"^,   f(x°)  =  2452.8    .        (4.52) 


The  AOP  program,    using   the  minimization  algorithm  proposed  by  Cullum 
[12]  and  m  =  2   in   (4.51),   converges   to    the  point 


x^^  =   (8.613,    12.36,    142.07)''^,        f(x^^)    =   1550.3 


requiring  31  iterations,  213  function  evaluations,  and  36  gradient 
evaluations.   The  computer  time  required  was  1  minute  and  41.54  seconds. 
At  this  point  the  nonlinear  constraints  are  slightly  negative;  thus 
they  are  satisfied  which  is  an  indication  that  the  solution  is  inac- 

9 

curate  [34].   The  area  of  the  devices  given  by  (4.45)  is  1549.9  mils 

2 
and  (4.46)  yields  .04468  ma  .   The  next  run  is  AOP  with  the  VO  algo- 
rithm from  the  same  initial  point.   This  time  the  algorithm  converged  to 
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x^  =  (5,  7.1755,  139.755)^,   f(x^  =  1467.69 


using  the  quadratic  penalty  method  (m  =  2).   Observe  that  this  solution 
yields  a  smaller  function  and  thus  it  is  more  accurate  than  the  pre- 
vious solution.   Moreover,  the  VO  algorithm  required  7  iterations,  95 

function  evaluations,  33  gradient  evaluations,  and  49.89  seconds  of 

2 
computer  time.   The  area  of  the  devices  at  this  point  is  1455.79  mils  , 

2 
(4.46)  yields  .0444  ma  ,  and  the  constraints  are  slightly  larger  than 

zero,  but  the  solution  is  considered  satisfactory.   The  VO  algorithm 

with  the  cubic  penalty  function  method  (m  =  3)  converges  to 


x^  =  (5,  7.1778,  126.932)"^,   f(x^)  =  1371.84 


The  constraints,  while  still  basically  satisfactory,  are  slightly  more 

positive  than  for  the  quadratic  penalty  function,  as  expected.   The 

2  2 

area  of  the  devices  is  1325.52  mils  and  (4.46)  yields  .0429  ma  .   This 

minimization  required  8  iterations,  101  function  evaluations,  40  gra- 
dient evaluations,  and  53.32  seconds  of  computer  time.   Once  again, 
the  quadratic  penalty  function  method  was  slightly  more  efficient  than 
the  cubic  penalty  function  method.   The  results  of  this  numerical 
experiment  are  summarized  in  Table  4.12. 


4.6.12  Power  Supply  Regulator  Circuit  Optimization  Example 

The  last  example  is  the  design  of  the  power  regulator  [2]  shown 
in  Fig.  4.7.   This  regulator  has  two  zener  diodes,  one  pnp  and  three 
npn  bipolar  transistors.   The  design  objective  is  to  minimize  the  vari- 
ation of  the  output  voltage,  VOUT,  about  a  constant  level.   It  is  also 
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VINo 


OVOUT 


(a) 


Figure  4.7   (a)  Power  supply  regulator  optimized,  load  resistor  is 
RL  =  200  Q.      (b)  The  zener  diode  model  used.   (c)  The 
model  for  the  bipolar  transistors. 
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Figure  4.7   (Continued) 
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desired  that  the  power  dissipation  be  reduced,  if  possible.   The  de- 
signable  parameters  are  all  of  the  resistors  in  the  circuit,  except 
the  load,  and  the  capacitor;  thus  this  example  has  a  total  of  seven 
designable  parameters. 

The  equations  describing  the  device  models  of  the  circuit  are  the 
following  [2].   Resistance  units  are  given  in  KQ,  capacitance  units 
are  in  pf.,  currents  are  In  ma,  voltages  are  in  volts,  and  time  units 
are  in  nsec. 

Zener  Zl.   The  zener  model  is  shown  in  Fig.  4.7b,  and  the  circuit 
elements  are  defined  by 

RZS  =  .028, 

RZP  =  100, 

CZ  =  463/(.75-VCZ)*^  +  4.09xl0^(JZ  +  2. 69x10"'^), 
JZ  =  2.69xl0-^e(35-3)VCZ  _  ^^^ 

JZB  =  Table  4.13a. 
Zener  Z2.   The  zener  model  is  shown  in  Fig.  4.7b,  and  the  circuit 
elements  are  defined  by 

RZS  =  .005, 

4 
RZP  =  10  , 

CZ  =  285/(.75-VCZ)'^  +  2.67xl0^(JZ  +  4.6xl0~^), 

/  ^  1,^-9,  (29.7)VJZ   ,. 
JZ  =  4.6x10  (e         -  1)  , 

JZB  =  Table  4.13b. 

PNP  Transistor  Tl.   The  bipolar  model  is  shown  in  Fig.  4.7c,  and 

the  circuit  elements  are  defined  by 

RBB  =  .00182, 

RCC  =  10"'^, 

RE  =  1.99x10^, 
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TABLE  4.13  Tabular  set  of  points  describing  the  current  source  JZB 
for  zener  Zl  (a) ,  and  zener  Z2  (b)  for  the  power  supply 
regulator  circuit  of  Fig.  4.7. 


(a) 


(b) 


VJZB  =  -VCZ 

JZB 

-3 

-10^ 

-1 

-6x10^ 

-.8 

-500 

-.6 

-50 

-.5 

-2 

-.48 

-10-6 

0 

10-6 

1 

.01 

3.3 

.01 

3.35 

1 

3.44 

20 

3.46 

50 

VJZB  =  -VCZ 


-3 

-1 

-.8 

-.6 

-.5 

-.48 

0 

1 
6.2 
6.25 
6.35 
6.4 


JZB 


-10 
-6x10" 
-500 
-50 
-2 


-10 


10 


-6 


10 
lO" 
1 
20 
50 


-4 
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RC  =   1.89x10^, 

AQF,  _Q 

CE  =  83. 8/(1. 63  +  VCE)+105(JE+3. 56x10     ), 

CC  =   72. 7/(.936  +  VCC)"^^^  +  475(JC+ 1.44x10"^), 
JE  =-3.56xlO-^e-(33)VCE  _   ^^^ 

JC  =-1.44x10-^3-^23. 9)VCC  _   ^^^ 

JA  =  -(.952)JE, 

JB  =  -(.667)JC. 

NPN  Transistors  T2,  T3,  and  T4.   The  bipolar  model  is  shown  in 

Fig.  4.7c,  and  the  circuit  elements  are  defined  by 

RBB  =  .01, 

RCC  =  .001, 

RE  =  5x10^, 

RC  =  6x10^, 

CE  =  54. 6/(.713-VCE)'^2-^  +  56.5(JE+2. 19x10"^), 

CC  =  2690/(1. 68-VCC)^'^^  +  258(JC+2. 26x10"^), 

JE  =  2.19xl0-^(e^^2)^^^  -  1), 
JC  =  2.26xl0-^e^32.4)VCC  _  ^^  ^ 

JA  =  (.988)JE, 
JB  =  (.833)JC. 
We  must  now  express  the  design  objectives  in  terms  of  a  scalar 
performance  function.   The  input  waveform  is  defined  by 


VIN  =10  +  2  sin(2Trt/100) 

which  is  a  sine  wave  centered  at  10  volts.   For  VIN  =  10  v. ,  the  ouput 
voltage,  VOUT,  is  approximately  7.5  v.  at  the  initial  guess  given  below. 
The  design  objective  is  that  VOUT  varies  from  7.5  v.  as  little  as 
possible  in  a  transient  analysis  from  t  =  0  to  t  =  100.   This  variation 
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may  be  expressed  by 


100  2 

DIFF  =  /    (VOUT  -  7,5)   dt 


Since  it  is  also  desired  to  reduce  the  power  dissipation,  the  miniral- 
zation  problem  is  defined  by 


100     A  ? 

minimize  f(x)  =  /    [(10  )DIFF  +  (UN)  ]  dt 


where  UN  is  the  current  into  the  network,  and  the  independent  variables 
or  designable  parameters  are 


X  =  (Rl,  R2,  R3,  R4,  R5,  R6,  Cl)^ 


All  that  remains  is  to  specify  the  box  constraints  on  the  independent 
variables,  which  are  given  by 

1  <  Rl  <  100, 
.1  £  R2  £  100, 
5  £  R3  £  100, 
1  £  R4  £  100, 
1  <_  R5  £  100, 
.5  £  R6  £  100, 
100  1  CI  £  10^. 

The  initial  guess  to  the  solution  is 


x°  =  (5,  1,  30,  6.5,  5.25,  1.25,  Sxio'^)^ 
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at  which  point,   DIFF  =    147.96,   and 


f(x°)    =   1.6437  X   10^ 


f'(x°)   =    (-6x10"^,    2.7x10^,    -2x10^,   -10^,    9x10^,    4x10^,   -2x10"^)"^. 


The  AOP  program  [28]  with   the  minimization  algorithm  proposes-^  by   Cullum 
[12]   requires   8  iterations    11   gradient  and   56   function   evaluations    to 
obtain 


x^  =    (72.53,    14.2,    100,    1    26.73,    .5,    5.32x10^)^ 


f(x.^)  =  1.108  X   10^ 


f'(x^)    =    (-10-^,    10^,    -102,    2x10^,   -36.3,    5x10^,    -4x10  ^)^        , 


and  DIFF  =   95.763.      The  AOP  program  with   the  VO  algorithm  requires   4 
iterations,    42   gradient   and  57   function  evaluations   to   obtain 


x^  =    (81.48,    16.47,    100,    1,    100,    .5,    5.23x10^)''^ 


f(x^)    =   1.0999  X   10^ 


f'(x^)    =   (-10-^,    2x10"^,   -105,    2x10^,   -2.76,    5x10^,    -4x10"-^)'^ 


and  DIFF  =  95.125.   The  computer  times  required  by  AOP  were  approxi- 
mately 3.2  minutes  with  Cullum's  algorithm  and  approximately  4.5  minutes 
with  the  VO  algorithm.   Cullum's  algorithm  solved  this  problem  more 
efficiently  than  the  VO  algorithm.   Observe  that  Cullum's  algorithm 
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solved  this  problem  in  eight  iterations,  which  is  the  number  of  design- 
able  parameters  (or  the  number  of  independent  variables)  plus  one.   Due 
to  the  overhead  in  the  VO  algorithm  of  approximating  the  hessian  by 
differences  when  the  hessian  is  not  supplied,  the  VO  algorithm  may  be 
generally  less  efficient  than  other  algorithms  for  problems  which  are 
solved  by  the  other  algorithms  in  a  small  number  of  iterations.   This 
observation  is  generally  supported  by  the  comparison  with  seven  other 
algorithms  given  in  Section  4.6.6. 

4.7  Summary 

In  this  chapter  the  VO  algorithm  was  successfully  implemented, 
and  extended  to  practical  problems  that  occur  in  computer-aided  opti- 
mization of  circuits.   Numerical  results  on  several  problems  were  given. 
Comparison  of  the  VO  algorithm  with  published  results  of  seven  other 
popular  algorithms  on  solving  five  of  the  test  problems  indicates  that 
the  VO  algorithm  is  more  efficient  in  solving  four  out  of  the  five 
problems.   In  addition,  the  VO  algorithm  was  shown  to  be  very  effective 
in  avoiding  convergence  to  the  saddle  point  of  one  of  the  tested  prob- 
lems, even  when  the  initial  guess  was  very  close  to  the  saddle  point. 
As  long  as  errors  that  may  be  present  in  the  evaluation  of  the  function 
and  the  gradient  values  supplied  to  the  algorithm  can  be  estimated  and 
kept  small,  numerical  results  given  indicate  that  the  VO  algorithm  may 
still  be  used  effectively.   Due  to  the  hessian  being  approximated  by 
differences  when  it  is  not  supplied,  the  results  also  indicate  that  the 
VO  algorithm  may  be  less  efficient  than  other  algorithms  for  problems 
that  can  be  solved  in  a  small  number  of  iterations  by  the  other  algo- 
rithms. 


CHAPTER  5 
APPLICATION  OF  THE  VARIABLE-ORDER  CONCEPT  TO  CIRCUIT  ANALYSIS 

In  this  chapter,  we  discuss  the  application  of  the  principles 
behind  the  VO  algorithm  to  generate  a  class  of  iterative  methods  for 
solving  nonlinear  systems  of  equations  of  the  form 

F(x)  =  0   ,  (5.1) 

where  F:E  ->  E  is  a  nonlinear  vector  function.   In  most  computer-aided 
design  and  analysis  procedures  problem  (5.1)  is  solved  many  times  (the 
number  of  times  may  be  in  the  thousands).   Thus  small  improvements  to 
existing  methods  for  solving  (5.1)  may  translate  into  substantial 
savings  to  the  overall  design  or  analysis  procedure. 

The  class  of  iterative  solution  methods  is  based  on  the  infinite 
series  representation  of  a  solution  to  a  system  of  nonlinear  equations 
derived  earlier  in  Section  3.3.   By  suitable  selection  of  the  arbitrary 
vector  function  that  appears  in  the  series,  particular  infinite  series 
representations  of  a  solution  to  (5.1)  are  obtained.   Iterative  methods 
are  obtained  by  truncation  of  the  resulting  series.   Many  of  the  itera- 
tive methods  obtained  in  this  manner,  such  as  Newton's  method,  are  well 
known.   However,  the  derivation  technique  appears  to  be  both  novel,  and 
capable  of  yielding  any  number  of  new  iterative  methods. 

If  a  good  initial  guess  to  a  solution  of  (5.1)  is  available,  then 
one  iterative  method  which  uses  the  variable-order  concept  introduced 
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in  Chapter  3  appears  to  be  very  promising.   This  Variable-Order  (VO) 
iterative  method  was  implemented  in  the  transient  analysis  portion  of 
an  already  very  efficient  circuit  analysis  program  improving  its  effi- 
ciency still  further.   It  is  estimated  that  only  a  very  small  effort  is 
required  to  implement  the  VO  iterative  method  in  most  existing  circuit 
analysis  programs. 

If  the  initial  guess  to  a  solution  of  (5.1)  is  poor,  experimental 
evidence  is  given  that  indicates  the  desirability  of  initially  using 
an  iterative  method  which  depends  on  the  behavior  of  each  function  in 
F.   The  class  of  iterative  methods  derived  in  this  chapter  is  suitable 
for,  in  effect,  tailoring  the  iterative  method  to  F.  However,  only 
general  guidelines  are  given,  since  a  general  algorithm  does  not  seem 
possible  at  this  time. 

This  chapter  begins  with  a  brief  review  of  general  approaches  that 
have  been  proposed  for  solving  (5.1).   In  the  second  section  several 
iterative  methods  are  derived,  including  the  VO  method,  that  are  ob- 
tained from  infinite  series  representations  of  a  solution  to  (5.1).   In 
the  third  section  we  consider  the  implementation  of  the  VO  method  in  a 
transient  analysis  program  for  the  simulation  of  electronic  circuits. 
Finally,  the  fourth  section  gives  guidelines  in  solving  (5.1)  when  the 
initial  guess  is  poor,  as  in  dc  analysis  of  electronic  circuits,  using 
a  class  of  iterative  methods  derived  from  Infinite  series  representa- 
tions of  a  solution  to  (5.1).   Examples  are  given  which  illustrate  the 
techniques. 

5.1  Approaches  in  Finding  a  Solution 
The  problem  to  be  solved  may  be  expressed  by 
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F(x)  =  0   ,  (5.2) 

where  F:E  ->  E  is  a  nonlinear  vector  function  of  x  e  e".   We  wish  to 

*  n  * 

find  a  point  x  e  E  ,  such  that  x  =  x  satisfies  (5.2).   It  is  assumed 

that  such  a  point  exists;  however,  it  is  important  to  recognize  that 
for  general  problems  a  point  satisfying  (5.2)  may  not  exist,  or  alter- 
natively, there  may  be  many  points  that  satisfy  (5.2)  [38]. 

There  are  two  practical  approaches  for  finding  a  solution  of  (5.2), 
The  first  approach  converts  the  problem  into  an  unconstrained  minimi- 
zation problem  which  may  then  be  solved  by  a  minimization  algorithm. 
The  second  approach  is  to  use  a  more  efficient,  but  perhaps  less  stable, 
iterative  method  derived  directly  for  the  solution  of  (5.2). 

5.1.1  Equivalent  Unconstrained  Minimization  Problems 

Let  the  vector  function  F(x)  have  component  functions  given  by 
f.(x),  i  =  1,  ....  n.   That  is,  define  F(x)  by 


F(x)  =  (f.(x),  f.(x),  ...,  f  (x))^    .  (5.3) 


Then  the  class  of  unconstrained  minimization  problems  given  by 


minimize   f(x)  =  I      (f  (x))^™    ,  (5.4) 

X  i=l 


where  m  is  a  positive  integer,  is  equivalent  to  problem  (5.2),  provided 
that  a  global  minimum  point  of  f(x)  is  computed  [38],  For  example,  for 
m  =  1,  problem  (5.4)  may  be  expressed  by 
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minimize   f(x 

X 

1. 

)  = 

FCx)*^  F(x) 

ft 

(5.5) 

A  necessary  condition 

for 

a  solution  x  of 

an  unconstrained  minimiza- 

tion 

problem  was  given  in 

Section  3.1  to  b 

e  that  the  gradient  of  f(x). 

at  X 

=  X  ,  vanishes. 

For 

(5.5)  this  condi 

tion  yields  the 

system  of 

equations 

f (x)  =  2F'(x) 

F(x 

>  =  0 

(5.6) 

where 

F'(x)  is  called 

a,  % 

the 

Jacobian  matrix  of  F  with  (i,j) 

element  given 

by 


3f . (x) 

[F'(x)],.  =-r^    .  (5.7) 

'\y   'b   11     dX. 
J 


Equation  (5.6)  vanishes  at  points  that  are  solutions  of  the  problem 
(5.2)  being  solved.   However,  the  major  difficulty  is  that  (5.6)  may 
also  vanish  at  points  for  which  F(x)  is  not  zero.   Therefore  since 
unconstrained  minimization  algorithms,  including  the  VO  algorithm  de- 
rived in  the  preceding  two  chapters,  compute  local  minimum  solutions, 
the  computed  solution  of  (5.4)  may  not  be  a  solution  of  (5.2).   In 
addition,  when  a  good  initial  guess  to  a  solution  x  of  (5.2)  is  avail- 
able, iterative  methods  discussed  in  the  next  section  are  generally 
more  efficient.   On  the  other  hand,  the  availability  of  general  mini- 
mization algorithms,  such  as  the  VO  algorithm,  allows  for  an  easy 
method  of  solving  the  general  problem  (5.2)  via  (5.5),  if  we  are  willing 
to  perhaps  try  several  different  Initial  guesses  until  the  computed 
minimum  satisfies  (5.2). 
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For  example,  the  VO  algorithm  for  unconstrained  minimization  given 
in  Section  4.5  was  used  to  compute  the  saddle  point  of  Wood's  function 
given  in  Section  4.6.4.   The  saddle  point  was  computed  by  solving  the 
unconstrained  minimization  problem 


T 
minimize   f'(x)   f'(x) 


where  fix)    is    the  gradisnt  vector  of  Wood's  function.   The  VO  algorithm 

'Xj         'Xj 

was  used  with  an  initial  guess  near  the  saddle  point.   Unless  the  ini- 
tial guess  was  very  close  to  the  saddle  point,  Newton's  iteration, 
described  in  the  next  section,  failed  because  the  Jacobian,  which  is 
the  hessian  of  Wood's  function,  is  singular  and  near  singular  at  several 
points  in  the  neighborhood  of  the  saddle  point. 

5.1.2   Iterative  Methods 

Iterative  methods  for  solving  (5.2)  may  be  thought  of  attempting 
to  approximate  the  behavior  of  F(x)  in  the  neighborhood  of  a  point  x 

by  a  simple  vector  function.   An  iterative  method  follows  by  finding 

k+1 
x    as  the  solution  of  the  simpler  problem,  and  repeating  the  process 

[38]. 

Probably   the  most  basic   iteration,    certainly   the  most   popular   in 

circuit  analysis   programs    [e.g.    3,6,9,13,17,27,52],   is  Newton's  method: 

x  =  X     -  F'(x   )         F(x   )  .  (5.8) 

This  iteration  has  some  important  advantageous  properties:   1)  when  it 
converges,  the  order  of  convergence  is  two  for  most  problems  [38],  and 
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2)  the  elements  of  the  Jacobian  matrix  F' (x)  have  natural  circuit  com- 
ponent  interpretations  and  therefore  the  iteration  can  often  be  written 
by  inspection  [18].   On  the  other  hand,  the  disadvantages  are,  1)  it 
may  not  converge  for  poor  initial  guesses  [33],  2)  the  Jacobian  matrix 
is  required,  and  3)  often  a  large  system  of  equations  must  be  solved. 
These  disadvantages  are  explored  in  more  detail  below. 

The  most  severe  problem  with  Newton's  iteration  is  its  generally 

k 
poor  behavior  when  x  is  not  in  a  small  neighborhood  about  a  solution 

it 

X   (other  methods  [38]  however,  encounter  similar  difficulties).   In 
fact,  the  problem  of  solving  (5.2)  with  poor  initial  guesses,  generally 
manifested  in  the  dc  analysis  of  nonlinear  circuits,  is  a  problem  which 
has  not  been  satisfactorily  solved,  and  an  entirely  satisfactory  solu- 
tion is  not  offered.   Some  modifications  of  Newton's  iteration  have 
been  proposed  that  "damp"  the  Iteration  by 


<    =  X  -  p  F' (x  )   F(x  )    ,  (5.9) 


where  p  is  arbitrarily  specified  to  be  less  than  one  initially  while 

k.^^*,  k  *   r  -I 

x  xs  far  from  x  ,  and  set  to  one  when  x  approaches  x  [3,18].   Other 

methods,  called  continuation  methods  by  numerical  analysts  [38],  may 
be  described  as  converting  the  dc  problem  into  a  transient  problem 
where  the  dc  solution  is  approached  asymptotically  as  the  variable  time 
becomes  large  [52].   A  previously  unreported  method  is  described  in 
Section  5.4,  which  combines  the  basic  idea  of  these  two  methods  with 
the  variable  order  concept.   However,  while  encouraging  numerical  re- 
sults are  given,  no  definite  theoretical  results  appear  to  be  possible. 

2 
The  Jacobian  matrix  has  n  elements  which  must  be  determined.   This 
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disadvantage  has  been  effectively  eliminated.   In  circuit  analysis,  a 
large  number  of  these  elements  are  zero  and  many  of  the  nonzero  elements 
are  constant.   In  addition,  the  nonzero  elements  that  are  not  constant 
are  computed  by  difference  approximations  with  excellent  results  [52], 
In  order  to  take  advantage  of  the  fact  that  many  of  the  Jacobian  ele- 
ments are  zero,  and  the  fact  that  many  of  the  nonzero  elements  are 
constant,  several  sparse  matrix  methods  of  implementing  the  iteration 
(5.8)  have  been  proposed  [27,52],   A  brief  description  of  these  imple- 
mentations is  given  later. 

5.2   Infinite  Series  Representation  of  a  Solution 

In  Section  3.3  v;e  derived  an  infinite  series  representation  of 
the  point  x  at  which  the  gradient  of  a  scalar  function  is  zero.   This 
result  is  directly  applicable  for  a  solution  x   to  (5.2),  and  it  is 
given  by 


X*  =  x^  +  X'(z^(z*  -   z'')   +   (l/2)[X"(z^)(z*  -   z^](z*  -   zb  +  . 

(5.10) 


where 


1^  =  ?^?)  ' 

'Xj  %     'Xj 


F(X(z))    =   G(2), 

'\l     '\i    'Xi  '\i     rXj 


*  -1 

z     =  G   '(0), 


z"  =   G"^F(xS), 


X'(z^)    =   F'(x'')    ^    G'(z''), 

'\j'V>  Oj'V.  "IjIi 
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X"(zS  =F'(xh  hG"(z^)    -   Cf"(x^X'(z^]x'(z^], 


'h        'Xj        i^j        %  '\j        '\j 


with  any  higher  derivatives  of  X(z)  evaluated  at  z  =  z  obtained  by 
further  differentiation  and  the  use  of  the  chain  rule.   Under  the 
assumption  that  all  the  required  derivatives  and  the  inverse  of  the 
Jacobian  F' (x  )  exists,  a  selection  of  a  function  G(z),  which  should 
be  simple  to  invert,  yields  a  particular  form  of  the  series.   We  now 
consider  two  possibilities  of  generating  iterative  methods  from  (5.10) 


5.2.1  A  Class  of  Iterative  Methods 

If  only  two  terms  are  kept  in  the  infinite  series  (5.10),  a  class 
of  iterative  methods  results  which  may  be  expressed  by 


x'^"^^   =  x^  +  F'(x^    ^   G'(z^(z*  -   z^)        ,  (5.11a) 


where 


z^  =  G~^(F(xS)        ,  (5.11b) 


Oi  O/  'V    Oj 


and 


z*  =  Q-^O)  .  (5.11c) 


0/  Oi  O/ 


Newton's  method  is  obtained  for 


G(z)  =  (z^,  z^,  ...,  z^)^   ,  (5.12) 


ft  k      k. 

since  z  =  0,  and  z   =  F(z  ).   Also,  the  function 
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G(z)  =  (zP  z?,  ....  zP)^   ,  (5.13) 


yields  the  "damped"  Newton's  iteration  given  by 


x^"*"^  =  x^  -  p  F'(x^)  ^  F(x'^)     .  (5.14) 

'\j  "X,        '\i       '\j  %    '\j 


It  is  often  beneficial  to  choose  a  G(z)  which  resembles  the  be- 
havior  of  the  function  being  solved.   For  example,  consider  the  scalar 
problem 


f(x)  =  X  -  3  +  10  12^gX/.026  _  1)  ^  0    ,  (5.15) 


which  results  from  a  simple  circuit  composed  of  a  3-volt  battery  in 
series  with  a  100-ohm  resistor  and  a  diode  whose  current  is  given  by 
10   (e   '    -  1)>  where  x  is  the  voltage  across  the  diode.   Newton's 
iteration  for  solving  (5.15)  is  given  by 


Vi  =  \-^(V/^'K^     •  ^'-^'^ 


From  the  initial  guess  x_  =  2,  (5.16)  yields  x^  =  1.974,  and  computing 
the  solution  x^  =  .7396105  requires  over  40  iterations.   The  excessive 
number  of  iterations  is  due  to  the  fact  that  Newton's  iteration  assumes 
the  function  to  be  linear  at  each  x,  ,  while  (5.15)  has  an  exponential 
behavior  for  the  initial  guess  given  and  over  most  of  the  iterations. 
The  iterative  method  given  by 


f(x  )  +  1 
Vl  =  ^k-   f'(x,^)   l"ll  +  f(Vl    '  <5.17) 


Note  the  use  of  subscripts   for  scalars. 


-175- 
is  obtained  from  (5.11)  for  the  gCz)  function  given  by 

8(z)  =  e^  -  1    .  (5.18) 

From  X  =  2,  using  (5.17)  for  problem  (5.15)  yields  x  =  .7184066.   If 

we  then  switch  to  Newton's  iteration,  since  x,  is  close  to  x^,  conver- 

gence  is  achieved  in  5  iterations.   It  might  be  concluded  that  g(z) 

should  be  selected  according  to  the  behavior  of  the  function  about  the 

current  estimate  of  the  solution  x,  .   We  will  return  to  this  conclusion 

Ic 

later. 

Extending  the  preceding  example  to  n  dimensions,  one  could  select 
each  individual  function  in  G(z)  according  to  the  behavior  of  its  cor- 
responding  function  in  F(x) .   We  will  call  these  methods  tailored 
iterative  methods.   For  example,  a  two-dimensional  equivalent  of  problem 
(5.15)  is 


f-(x)  =  X,  +  100  X-  -  3  =  0   ,  (5.19a) 


X  /.026 
f„(x)  =  10  ^^(e        -  1)  _  x„  =  0   ,  (5.19b) 


where  x^  is  the  voltage  across  the  diode,  and  x„  is  the  current  in  the 

circuit,  and  the  solution  is  x  =  (x, ,  x„)   =  (.7396105,  .0226)  . 

0  T 

Using  Newton's  method  with  the  initial  guess  x  =  (3,  0)  ,  yields 

1  -4  T 

X  =  (2.974,  2.6x10   )  ,  and  over  50  iterations  are  required  to  con- 

A 

verge  to  x  .   If  we  select 


G(z)  =  (z        e  2  -  1)'^    ,  (5.20) 


the  iterative  method  that  results  from  (5.11)  is  given  by 
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k+1    k   _,,  k,-l  ;,  k,  ,  ^ 

c    =  X  -  F'(x  )    F(x  )    ,  (5.21) 


where 


F(x^  =  (f,(0,  [1  +  f„(x^]ln|l  +  f.Cx^l)'^    . 

For  the  initial  guess  x  =  (3,  0)  ,  (5.21)  yields  x^=  (.83814,  .02162)'^, 

1  vt 

and  then  since  x  is  now  close  to  x  ,  after  switching  to  Newton's 

'Xj  ^j  ° 

method   convergence   is  achieved   in  less    than   10   iterations.      Therefore, 
similar   improvements  are  possible   in  n  dimensions. 

In  principle,    one  could   select  an   appropriate  G(z)    function  at   each 
iteration.      It  does   not  seem  possible  however,    that  a  general  algorithm 
can  be  developed   to   achieve   the  appropriate  selection  in  an  automatic 
way.      For   example,    in  problem   (5.19),    at   the  initial  guess   x     =   (0,    0)    , 
the  second   function    (5.19b)    does  not  have  an  exponential  behavior.      In 
fact,    the  behavior  of  both   functions   in    (5.19)    is    linear   about   this   x   . 

Thus   it  would  seem   that  Newton's   iteration  is  appropriate.      However, 

IT  * 

Newton's   iteration  yields  x     ^   (3,    0)    ,   which  is   farther   from  x     than 

"^  'h 

0  *  1/A  T 

x     IS.      Since  we  know  the  solution  x      for   this   problem,   G(z)  =  (z        ,z   ) 

for  example,   would   have  been   an  appropriate  G(z)    for   the   initial  guess 

0  T 

X     =    (0,    0)\ 


5.2.2  The  Variable-Order  Iterative  Method 


In  this  section  the  Variable-Order  (VO)  iterative  method  is  derived 
which  is  mainly  suitable  when  initial  guesses  to  the  solution  x  of 
(5.2)  are  close  to  x  .   The  method  is  based  on  the  infinite  series  that 
results  for  the  G(z)  function  given  by 
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G(z)  =  (z.,  z„,  ...,  z  )^    .  (5.22) 


For  this  G(z)  function,  (5.10)  becomes 


where 


X*  =  x^  -  D^  -   D^  -  ojf  -    ...        ,  (5.23a) 

'\j         0/         'V/2        ^^3       '^4 


])\=    ail)    r{^^)~^    [F"(x^    D^]  D^         ,  (5.23c) 


d5^=   F'(x^"^    CCF"(x^D^]D^  -   (1/6)    [[F"'  (x^D^]D^]Dh,    (5.23d) 
^i^        Oi      a.  "V     fX/     '\;2    0-3  a.        a.      0.2   '\y2   0/2 


are   called   the  second-,  third-,  fourth-,  etc. ,    order  corrections   respec- 
tively.     As   stated   in   Section  3.3,    this   infinite  series   is  a  well  known 
and   extensively  studied   result    [19,47,39,50]   extended   to  n  dimensions. 
The  important   property   of   (5.23)    is   that   iterative  methods   obtained 
from  the  series  have   increasingly  higher   order  of  convergence,    for  most 
functions   [47,39,50],    as  more   terms   are  retained  from  the  series.      For 
example. 


c^^+l   =   x'^  -  d!^        ,  (5.24) 

ii  '\j  0/2 


obtained  from  (5.23)  by  retaining  the  first  two  terms  is  the  well-known 
Newton   iteration  which  has  second-order  convergence.   The  iteration 
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k+1    k    k   ^k 
X    =  X  -  D  -  D    ,  (5.25) 


obtained  from  (5.23)  by  retaining  three  terms,  has  third-order  con- 
vergence.  Retaining  four  terms  yields 


k+1    k   ^k   ^k    V. 
I         =^  -^2-^3-^4   '  (5-26) 


which  has  fourth-order  convergence.   Clearly  any  iterative  method  with 
even  higher  order  can  be  obtained  similarly,  assuming  all  the  higher 
derivatives  and  the  inverse  of  the  Jacobian  exist.   However,  it  was 
determined  experimentally  that  both,  due  to  errors  in  computing  (or 
approximating)  the  corrections  of  order  higher  than  four,  and  due  to 
no  observed  improvement  in  efficiency,  iterative  methods  of  order  higher 
than  four  need  not  be  considered.   The  third-order  correction  may  be 
approximated  by 


i3^r^^''^~'V^\^         '  (5.27a) 


where 


k    k   ^k 
^2  =  ^  -  ^2  (5.27b) 


is  the  second-order  point  (or  Newton  point) .   This  approximation  fol- 

k        k 
lows  from  the  Taylor  series  expansion  of  F(x-)  about  x  as  was  shown  in 

Section  3.3.1.   Similarly,  the  fourth-order  correction  may  be  approxi- 
mated by  (see  Section  3.3.1) 


t  =  V^i^~'l'^l2>         '  (5.28a) 

where 
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k 

k 

T^k 

nk 

x„    = 

X 

-  D„   - 

-  D^ 

'\.3 

'\- 

Ki 

'\'3 

(5.28b) 


Is  the  third-order  point.   The  iterative  methods  (5.25)  and  (5.26), 
using  the  above  approximations,  can  be  thought  of  as  Newton's  iteration 
without  evaluating  the  Jacobian  matrix  at  each  iteration.   Traub  [50] 
proposed  and  studied  a  similar  iterative  method  derived  differently. 
It  can  be  shown  that  the  order  of  convergence  for  the  iterative  methods 
still  remains  the  same  even  with  the  preceding  approximations  [50,38]. 

The  VO  Iterative  method  for  solving  (5.2)  may  be  briefly  described 
as  follows  (a  more  detailed  implementation  is  given  later) : 

STEP  1:   Set  k  =  0,  and  obtain  x°. 
STEP  2:   Compute  d'^. 

STEP  3:   If  II  D^  ||  is  small,  go  to  STEP  5. 

k+1     k       k 
STEP  4:   Set  X    =  x  -  p  dJ:,  for  some  0  <  p  <  1,  also  set 

k  =  k  +  1,  and  go  to  STEP  2. 

STEP  5:   If  II  Dj  II  is  sufficiently  small  to  satisfy  a  convergence 

k+1    k    k 
criterion, set  x    =  x  -  D„,  and  go  to  STEP  8.   Other- 

wise  set  r  =  3  and  continue. 

STEP  6:   Compute  D  or  its  approximation. 

STEP  7:   If  II  D  II  is  sufficiently  small  to  satisfy  the  convergence 

k+1    k    k         k 
criterion, set  x    =x  -D„-...-D,  and  go  to  STEP  8. 

Otherwise  set  r  =  r  +  1,  and  if  r  _<  4,  go  to  STEP  6; 

otherwise  go  to  STEP  2. 

A  k+1 

STEP  8:   Done.   Approximation  to  a  solution  x  of  (5.2)  is  x 

a.  'b 

The  basic  idea  of  the  above  algorithm  is  the  following.   Use  Newton's 
iteration  or  its  "damped"  modification  (5.14)  or  a  tailored  iterative 
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k  * 

method  as  described  in  Section  5.2.1,  until  x  is  close  to  x  .   Then 

A 

higher-order  corrections  are  computed  until  x  is  approximated  to  the 
desired  accuracy.   The  assumption  underlying  the  VO  iteration  is  that 
the  higher-order  corrections  are  more  efficiently  computed  than  the 
second-order  correction  D  .   This  assumption  is  justified  next. 

Since  we  wish  to  implement  this  method  in  a  circuit  analysis  pro- 
gram, the  way  Newton's  iteration  is  normally  implemented  will  be  briefly 
reviewed  now.   The  vector  function  F(x)  for  circuit  analysis  applica- 
tions  normally  has  three  partitions  given  by 


F(x)  =  (F„(x),  F  (x),  F  (x))^   ,  (5.29a) 


with  F  a  linear  and  homogeneous  partition  given  by 


lA^    =1^1        '  (5.29b) 


and  F  a  linear  partition  given  by 


F  (x)  =  J  X  -  c    ,  (5.29c) 


where  J  and  J  are  constant  and  sparse  matrices,  and  c,  is  a  constant 
Old     ojL  o/L 

vector,  and  where  F„(x)  are  the  nonlinear  functions  in  F(x).   If  F(x) 
represents  the  circuit  equations,  the  number  of  functions  in  F  ,  is 
usually  very  small  compared  with  n,  where  n  is  the  total  number  of 
functions  in  F(x) .   Newton's  iteration  is  rarely  implemented  as  written 
in  (5.24).   Most  texts  reconmiend  solving  the  system  of  equations  given 
by 

t^i^   ^i^-V^""^  (5.30a) 


where 
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,  k    k+1    k 
Ax  =  X    -  X 


(5.30b) 


Then  (5.30b)  may  be  used  to  obtain  x 
then  (5.30)  becomes 


k+1 


If  F(x)  is  given  by  (5.29), 


^H 


4 


Ax  = 


T             ^ 

J,     X       - 

Sl 

!»<?'> 

(5.31) 


However,  using  (5.30b),  we  can  write  (5.31)  as  follows 


^H 


4 


f;,(x^) 


k+1 


F'(x^)  x''  -  F^rx*") 


(5.32) 


which  is  computationally  more  efficient  than  (5.31),  if  both  the  number 

k   k 
of  functions  in  F„(x)  is  small,  and  the  product  F'(x  )  x  can  be  evalu- 

ated  taking  into  consideration  the  sparsity  of  F'(x  ).   Modern  circuit 

analysis  programs  implement  Newton's  iteration  by  (5.32)  [27,52].   In 

the  solution  of  (5.32),  advantage  is  taken  of  both  the  zero  elements 

and  the  constant  elements  in  the  system  of  equations.   While  (5.32)  will 

suffice  for  our  purposes  in  this  chapter,  additional  partitions  can  be 

introduced  which,  when  taken  advantage  of,  make  the  iteration  extremely 

efficient  [27,17],   The  system  of  equations  (5.32)  may  be  expressed  as 
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'4 

0 

k 

k 
^2  = 

^L 

J.;^^''^ 

Jn^^V- 

4^^'>  . 

(5.33) 


It 

where  x^  is  the  second-order  point  (5.27b).  The  second-order  correction 

is  then  given  by 


^k    k    k 
D  =  X  -  x„ 

^2       'b    ^2 


(5.34) 


The  third-order  point,  x^ ,  is  defined  in  (5.28b).   Using  (5.27b), 
equation  (5.28b)  solved  for  x^  and  substituted  into  (5.33)  yields 


4 

0 

k 

<^3  ■*■  ^3>    = 

^L 

^N^^e")  _ 

Jn(?S^)?5'- 

lu^i^  . 

(5.35) 


Using  (5.23c)  we  have 


k 

4 

t.- 

r(^')  Ji]  = 

^N^^"^)  , 

(i/2)C^;;(;^^  Ji^]  dJ 


(5.36) 


Note  that  for  the  partition  of  F(x)  in  (5.29)  we  have 


F"(x)  =  (0,  0,  F"(x)) 


(5.37) 
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Substitution  of   (5.36)   Into    (5.35)   yields  the  third-order  point  as  the 
solution  of 


^H 


k 
'\'3 


Then 


4 

i;^^^^ 


„k  ^     k  _     k 
^3       ^2       ^3 


F' (x^x^  -  F,,(x^    -    (1/2)[f;!(xSdJ]d!' 


(5.38) 


k  k  k 
If  the  term  (1/2)  FJ!(x  )D  D   can  be  computed  efficiently  by  taking 

into  consideration  the  sparsity  of  F!l(x  ),  setting-up  and  solving 
(5.38)  can  be  done  more  efficiently  than  setting-up  and  solving  another 
second-order  point  by  (5.33).   The  reason  is  that  the  factorization  of 
the  Jacobian  in  (5.38)  is  identical  to  (5.33)  and  most  of  the  right- 
hand  side  is  also  the  same.   The  fourth-order  point  given  by 


K  K  tC  K  _^R  R  T^*^ 
x  =  X  -  D_  -  D  -  D.  =  x„  -  D, 
'\/A   "u         0^2   T/S   V+   ^3       "XiA 


(5.39) 


can  be  similarly  derived  to  be  the  solution  of 


'4 

■  0 

^L 

k 

X,     = 

'^4 

^L 

.^N^^')_ 

B 

(5.40) 


where 


Then 
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B  =  F'(x^)x^  -  F„(x^  -  (1/2)[F"(xSd^]D^  -  [F';(x^D^rf 


^^^i^'-^^K^-btti^t 


„k  k  k 
D,  =  x„  -  X. 
04   '^3   'w 


If  the  number  of  nonlinear  functions  in  F„(x)  is  small  and  if  the  added 

'\^  '\j 

term  in  the  right-hand  side  of  (5.40)  can  be  computed  by  taking  into 

k  k 

consideration  the  sparsity  of  F''(x  )  and  F"'  (x  )  ,  the  fourth-order 

point  may  be  computed  more  efficiently  than  a  new  second-order  point. 

If  instead  of  computing  the  exact  third-order  correction  we  use 

the  approximation  (5.27),  the  third-order  point  can  be  derived  in  a 

straightforward  manner  to  be  the  solution  of 


"4 

^ 

^L 

^3  "^ 

^L 

VsS^^  _ 

F"  (x^)x!^  - 
roN^'\j    ■'0,2 

^N^^2)  . 

(5.41) 


Clearly,  setting-up  (5.41)  and  solving  it  for  x  can  be  done  more  affi- 
le 
ciently  than  setting-up  (5.33)  and  solving  it  for  an  entirely  new  X- 

(that  is,  a  pure  Newton's  iteration).   The  approximate  fourth-order 

point,  using  (5.28),  can  also  be  derived  in  a  straightforward  manner 

to  be  the  solution  of 
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4 


^L 


k 


(5.42) 


We  can  now  give  the  computationally  efficient  VO  iterative  method 
for  solving  (5.2)  with  F(x)  partitioned  as  in  (5.29). 


VO  Iterative  Method. 

STEP  1,   Set  k.  =  0,  obtain  x  ,  e   and  e   used  in  the  convergence 

O/  '   ca      cr  ° 

tests  (typically  e   =  5  x  10~  ,  and  e   =  5  x  10~  gave 
ca  cr  ° 

good  results),  and  MAXIT  (the  maximum  number  of  iterations 
to  be  done  whether  convergence  is  achieved  or  not) . 

STEP  2:   Evaluate  the  Jacobian  and  the  right-hand  side  in  (5.33) 
at  the  point  x  . 

STEP  3:   Compute  the  LU  factorization  of  the  Jacobian  [33].   This 

step  and  the  preceding  one  can  be  made  extremely  efficient 
if  the  factorization  is  partitioned  so  that  the  constant 
part  of  the  factorization  is  done  only  once  in  STEP  1, 
and  if  the  generally  numerous  zeros  in  the  Jacobian  are 
taken  into  consideration  [27 ]. 

STEP  4:   Compute  the  second-order  point  x  by  forward  and  back 

substitution.   If  the  nonlinear  functions  F  only  depend 
on  a  subset  of  the  unknown  vector  x,  then  the  back  sub- 
stitutlon  can  be  designed  to  take  advantage  of  this  fact 
[27]. 

STEP  5:   If  |d^  I  =  |xJ  -  x^  I  <  e^^  +  e^^    |xj|  for  all  i  =  1,  ..., 

^  ^        k  A 

n,  then  it  is  assumed  that  x„  is  an  approximation  to  x  : 
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thus  set  X    =  x„  and  go  to  STEP  15. 

STEP  6:   If  Id^  I  >  0.5  and  |d^  |  >  0.5  \x^\,  for  any  i  =  1 ,  . . . ,  n, 

^  ^  k      "■ 

then  it  is  assumed  that  x  is  far  from  a  solution  x  ; 

k+1    k 
thus  set  X    =  x_  and  go  to  STEP  14. 


STEP  7:   Evaluate  the  right-hand  side  of  either  (5.38)  or  (5.41) 

k 
'\'3 


which  defines  x  or  its  approximation. 


STEP  8:  Compute  x„  or  its  approximation  by  forward  and  back  sub- 
stitution of  (5.38)  or  (5.41).  See  STEP  4  for  an  appli- 
cable comment. 

STEP  9:   If  |d^  I  =  |x^  "  x^  |  ^  z^^   or  \b^    |  <  .01  |dJ  |,  for 
all  i  =  1,  . . . ,  n,  then  x  is  an  approximation  to  x  ; 


a;3 
0,3 


thus,  set  X    =  x^  and  go  to  STEP  15. 


STEP  10:  If  |d^  I  >  e   and  |d^  |  >  .  1  |d!^  L  for  any  i  =  1,  .  .  . ,  n, 

i.     ca       J.  2^ 

then  it  is  assumed  that  the  infinite  series  (5.27)  is  not 

converging;  thus  set  x  =  x„  and  go  to  STEP  14. 

STEP  11:  Evaluate  the  right-hand  side  of  either  (5.40)  or  (5.42) 

k 
0^4 


1^ 
which  defines  x  or  its  approximation. 


STEP  12:  Compute  x,  or  its  approximation  by  forward  and  back  sub- 
stitution of  (5.40)  or  (5.42).  See  STEP  4  for  an  appli- 
cable comment. 

STEP  13:  If  |dJ  I  =  |x^  "  ^  I  1  ^,^   or  |dJ  |  <  .01  \b\    \,    for 

^       ^     ^    k         ^  ^       * 

all  i  =  1,  . . , ,  n,  then  x,  is  an  approximation  to  x  ; 

k+1    k 
thus  set  X    =  X.  and  go  to  STEP  15.   Otherwise  set 

k+I    k   , 
X    =  X,  and  continue, 
a.      0/4 

STEP  14:  Set  k  =  k  +  1,  and  if  k  <_  MAXIT  go  to  STEP  2;  otherwise 
convergence  could  not  be  achieved  in  MAXIT  iterations, 
thus  exit  with  error. 
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k+l  * 

STEP  15:  Done.   The  point  x    should  be  an  approximation  to  x  . 

The  VO  iteration  is  ideally  suited  for  solving  the  nonlinear  equations, 
which  arise  at  each  step,  when  implicit  integration  methods  [25,4,51] 
are  used  in  the  solution  of  nonlinear  differential  equations.  As  will 
be  shown,  these  problems  can  supply  good  initial  guesses  to  the  solu- 
tion of  the  nonlinear  equations. 

5.3  The  VO  Iterative  Method  in  Transient  Analysis 

As  shown  in  Chapter  2,  the  circuit  equations  may  be  conveniently 
expressed  by  a  system  of  algebraic  and  differential  equations  given  by 

l^X.'    %'  4'  '^^  =  ^   g<tQ)  =  go   ,  (5.43) 

where  w(t)  is  the  vector  of  branch  voltages,  branch  currents  and  node 
voltages,  q(t)  is  the  vector  of  capacitor  charges  and  inductor  fluxes, 
q(t)  is  the  time  derivative  of  q(t),  and  t  is  the  independent  variable 
time.   The  vector  function  F  comprises  Kirchhoff's  voltage  and  current 
law  equations,  and  the  branch  constituitive  equations. 

The  transient  analysis  of  the  circuit  equations  is  defined  to  be 
the  procedure  by  which  the  vectors  w(t)  and  g(t)  are  computed  to  satisfy 
(5.43)  for  all  values  of  time  in  the  interval  t_  _<  t  _<  T.   For  a  large 
number  of  circuits,  the  differential  equations  in  (5.43)  are  stiff 
[8,18,25].   That  is,  the  time  constants  of  the  circuit  are  widely  separ- 
ated.  This  stiffness  has  forced  most  recent  transient  analysis  programs 
to  use  implicit  numerical  integration  methods  [27,3,8,52]  in  computing 
approximations  to  w(t)  and  q(t). 
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In  general,  the  solutions  w(t)  and  q(t)  are  approximated  at  dis- 
crete values  of  time  t  =  t.,  t,,  ...,  t„,  where  t„  >  T.   The  approxi- 

U    i         M  M  — 

mation  is  accomplished  by  the  discretization  of  q(t)  at  each  time  step 
t  =  t  .   This  discretization  generally  consists  of  approximating  each 
component  q.(t  )  by  a  polynomial  passing  through  the  past  values  of 
q.(t).   For  implicit  integration  methods,  the  time  derivatives  are 
generally  discretized  by 

q.(t  )  =  S(q. (t  ),  q. (t   .),...,  q.(t    ),  h  ,  h   ,,...,  h     ,) 
'i  m      ^i  m  '  ^i  m-1  '      ^i^  m-r  '  m   m-T    '  m-r+l"^ 

where  h  =  t  -  t   ,  is  called  the  m   step  size,  and  the  function  S 
m    m    ra-i 

depends  on  the  Integration  method.   For  example,  the  backward  Euler 
discretization  method  [l8,8]  is  given  by 


1  m      i  m     1  m— 1    n 


Thus  at  each  time  point  t  =  t  ,  after  the  discretization  of  each  com- 

m 

ponent  of  q(t   ),    the   circuit  equations  become 
Oi     m 


F(w(t    ),    q(t   ))    =  0        ,  (5.44) 


which  is  a  nonlinear  system  of  equations  to  be  solved  for  w(t  )  and 

'\j     m 

q(t  ).   For  sufficiently  small  h  ,  the  solution  of  (5.44)  at  each  time 
•v  m  m 

step  is  in  general  very  close  to  the  solution  of  (5.44)  at  the  preceding 
time  step  (except  perhaps  at  the  first  time  point,  which  is  considered 
in  Section  5.4).   Therefore,  very  good  initial  estimates  of  a  solution 
of  (5.44)  can  usually  be  obtained. 

There  are  many  algorithms  for  determining  the  number  of  past  values 
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of  q(t)  to  use  in  the  discretization  of  q(t  )  (the  order  of  the  approx- 
iraating  polynomial) ,  and  for  determining  the  step  size  to  insure 
accuracy  [25,4,51].   The  algorithms  can  be  generally  described  by  the 
following  general  transient  iteration. 

General  Transient  Iteration. 

STEP  1:   Obtain  t  ,  h   (the  initial  step  size),  T,  q„  (the  initial 

conditions,  which  generally  require  a  dc  analysis).   Set 

m  =  1. 

STEP  2:   Set  t  =  t   ,  +  h  . 
m         m-l    m 

STEP  3:   Discretization  of  q(t  ).   This  step  depends  on  the  im- 

plicit  integration  method  being  used. 
STEP  4:   Extrapolate  from  the  previous  time  step  values  to  obtain 

an  initial  guess  to  the  desired  solution  of  (5.44). 

STEP  5:   Solve  (5.44)  for  w(t  )  and  q(t  ).   Normally  a  maximum  of 

'V'  m      'Cm 

five  iterations  are  done  in  attempting  to  solve  (5.44). 

If  the  convergence  criterion  is  not  met,  then  set  h  =  h  /2 

m   m 

and  go  to  STEP  2.   Otherwise  continue. 

STEP  6:   Estimate  the  truncation  error  in  the  discretization.   This 

step  also  depends  on  the  integration  method  as  STEP  3 

does.   If  the  truncation  error  is  too  large,  then  set 

h  =  h  /2  and  go  to  STEP  2.   Otherwise  determine  what  the 
mm 

next  step  size  h  ,  ,  should  be. 
m+1 

STEP  7:   Display  and/or  save  requested  outputs  at  the  current  time, 

t  .   If  t   <  T,  then  set  m  =  m  +  1  and  go  to  STEP  2. 
m      m 

Otherwise  the  transient  analysis  is  complete. 
Most  existing  Implementations  of  this  general  transient  analysis 
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iteration  use  a  Newton-like  iteration  to  solve  (5.44)  at  STEP  5  [3,6,9, 
13,17,27,52].   We  propose  to  use  the  VO  iteration  given  in  Section  5.2.2. 
Based  on  experimental  evidence,  it  is  conjectured  that  the  computer  time 
spent  in  STEP  5  is  from  10%  to  80%  of  the  total  computer  time.   The 
actual  computer  time  depends  on  many  factors  such  as,  1)  the  computer 
time  in  STEP  5  is  directly  proportional  to  the  efficiency  of  the  imple- 
mentation of  both  the  LU  factorization  of  the  Jacobian,  and  the  forward- 
back  substitution  of  the  solution  to  the  linear  system  of  equations, 
2)  the  computer  time  in  STEP  5  is  inversely  proportional  to  the  amount 
of  requested  output  in  STEP  7,  and  3)  the  computer  time  in  STEP  5  is 
directly  proportional  to  the  complexity  of  the  circuit,  represented  by 
the  circuit  equations.   For  the  AGP  program,  which  uses  an  extremely 
efficient  Newton's  iteration  at  STEP  5  [27,17],  the  measured  computer 
time  spent  in  STEP  5  was  from  14%  to  45%  for  the  examples  reported  in 
the  rest  of  this  chapter. 

The  AGP  program  does  a  maximum  of  five  Newton  iterations  before 
the  step  size  is  reduced,  as  noted  in  STEP  5.   It  was  experimentally 
found  however,  that  if  the  second-order  or  Newton  correction,  D  ,  was 
ever  large,  it  was  best  to  not  continue  with  the  iterations  (i.e. , 
reduce  the  step  size  right  away) .   This  check  is  implemented  in  STEP  6 
of  the  VO  iterative  method  described  in  Section  5.2.2  as  follows.   In- 
stead of  going  to  STEP  14  as  indicated,  if  |d^^  |  >  1  and  (D^  |  >  |x^|, 

i  i 

for  any  i  =  1,  ...,  n,  then  a  signal  to  reduce  the  step  size  is  made; 

otherwise  go  to  STEP  14. 

In  the  following,  the  AGP  program  results  with  Newton's  iteration 
at  STEP  5  are  labeled  AGP;  the  results  for  the  AGP  program  using  the  VO 
iteration  with  exact  corrections  are  labeled  AOP-VO;  and  the  results  for 
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the  AOP  program  using  the  VO  iteration  with  approximate  corrections  are 
labeled  AOP-VO-A.   In  trying  to  make  a  definite  statement  about  the 
improvements  in  efficiency  we  are  again  faced  (see  Section  4.6.2)  with 
the  fact  that  intervals  of  time  cannot  be  measured  accurately  in  the 
IBM/370  Mod  165  computer  used  [24],  perhaps  because  of  the  multipro- 
gramming environment.   As  will  be  seen,  several  inconsistencies  in  the 
measured  computer  times  make  conclusions  based  solely  on  this  figure 
somewhat  dubious.   However,  most  measured  computer  times,  which  are  in 
seconds,  indicate  savings  of  25%  in  STEP  5  on  average.   An  interesting 
result  was  that  the  total  number  of  second-order  or  Newton  corrections 
in  AOP,  which  is  equal  to  the  total  number  of  passes  in  the  language 
of  the  program,  were  approximately  equal  to  the  total  number  of  the  sum 
of  second-  third- and  fourth-order  corrections  in  AOP-VO  and  in  AOP-VO-A. 
That  is,  the  VO  iterative  method  reduced  the  computational  cost  of 
obtaining  many  of  the  corrections;  it  generally  did  not  reduce  the 
number  of  them. 

5.3.1  MOSFET  Nand  Gate  Example 

The  simple  nand  gate  described  earlier  in  Section  4.6.11  and  in 
Fig.  4.6  is  the  first  example.   The  two  inputs  are  now  set  to  a  time- 
dependent  trapezoidal  curve,  instead  of  being  constants  of  -6  volts  as 
in  Fig.  4.6.   Each  input  is  initially  constant  at  -6  volts.   Then  the 
rise  begins  at  5  nsec.  until  25  nsec,  at  which  point  they  stay  at  0 
volts  until  they  begin  to  fall  at  1025  nsec.  until  1045  nsec.  when  they 
stay  at  -6  volts.   The  transient  analysis  was  from  0  to  1500  nsec.   The 
device  models  were  as  described  in  Section  4.6.11,  with  the  exception 
that  the  substrate  terminal  was  not  included  and  thus  a  constant 
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threshold  voltage  of  2  volts  was  used.   A  function  subprogram  was  used 
to  supply  the  drain  to  source  current  and  the  appropriate  partial 
derivatives. 

The  results  for  AOP  and  AOP-VO  are  summarized  in  Table  5.1.   Ob- 
serve that  while  this  is  a  relatively  small  transient  analysis  problem 
by  today's  standards,  19%  savings  in  the  measured  computer  time  of 
STEP  5  resulted.   Note  that  AOP-VO  required  10  more  time  steps  than 
AOP.   As  will  be  seen,  in  other  examples  AOP-VO  required  less  time  steps 
than  AOP.   In  numerical  integration,  slight  differences  in  the  solution 
to  the  nonlinear  equations  in  one  time  step  causes  different  subsequent 
time  steps  to  be  taken.   However,  the  solution  of  both  AOP  and  AOP-VO 
were  in  agreement  to  three  significant  digits  as  expected. 

5.3.2  MOSFET  Buffer  Examples 

The  next  two  examples  are  MOS  buffers  shown  in  Figs.  5.1  and  5.2 
[24].   The  input  voltage,  VIN,  is  a  trapezoid  with  initial  level  at 
-6  v.;  the  rise  begins  at  5  nsec.  until  25  nsec,  at  which  point  it 
stays  at  0  v. ;  then  it  begins  to  fall  at  725  nsec.  until  745  nsec.  when 
it  stays  at  -6  v.   The  MOSFET  device  models  are  the  same  as  the  ones 
described  in  Section  4.6.11  with  the  following  widths  (w) ,  and  lengths 
(Z)    in  units  of  mils:   Tl,  w  =  88.9,  I   =  5.08;  T3,  w  =  12.7,  I   =  7.62; 
T4  and  T5,  w  =  10.76,  I   =  10.16;  T6,  w  =  10.16,  £  =  7.62;  T7,  w  =  7.62, 
a   =  35.56;  T9,  w  =  72.39,  2,  =  5.08;  TIO  and  Til,  w  =  152.4,  £  =  5.08. 

The  nonlinear  drain  to  source  currents  are  functions  of  three 
voltages,  and  therefore  there  are  several  nonzero  second  partial  deri- 
vatives.  It  was  determined  that  in  this  case  computing  the  exact 
fourth-order  point  by  (5.40)  was  not  efficient.   Thus  AOP-VO  did  not 


-193- 


TABLE  5.1  Results  of  transient  analysis  for  two  input  nand  gate  of 
Fig.  4.6.   Percent  savings  are  given  in  parentheses. 


Counters 

AOP 

AOP-VO 

Time  Steps 

83 

93 

2   Order  Corrections 

180 

110 

3   Order  Corrections 

0 

57 

4   Order  Corrections 

0 

10 

STEP  5  Computer  time 

1.03 

.83  (19%) 

STEPS  1-7  Computer  time 

3.30 

2.96  (10%) 

VINO- 
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Figure  5.1  (a)  Nine-device  MOS  buffer  circuit  analyzed.  (b)  Circuit 
diagram  of  the  buffer  block.  Resistance  units  are  in  Kfi, 
and  capacitance  units  are  in  pf.  The  MOS  device  model  is 
given  in  Fig.  4.6b. 
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Figure  5.2  Eighteen-device  MOS  buffer  analyzed.   The  circuit  diagram 
of  the  buffer  block  is  given  in  Fig.  5.1b. 
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obtain  the  fourth-order  corrections.   However,  all  of  the  higher  order 
corrections  can  always  be  approximated  efficiently. 

The  results  of  a  transient  analysis  from  0  to  1500  nsec.  are  sum- 
marized in  Tables  5.2  and  5.3.   Observe  that  these  two  experiments 
indicate  that  the  savings  tend  to  be  larger  as  the  size  of  the  circuit 
increases  as  it  was  suggested  earlier. 

5.3.3  ECL  Gate  Examples 

This  is  a  bipolar  example  consisting  of  an  ECL  gate  [37]  shown  in 
Fig.  5.3.   The  nonlinear  capacitances  (in  pf.)  and  current  sources 
(in  ma.)  of  the  device  model  shown  in  Fig.  5.3b  are  given  by  [26] 


,  ^1    ir.-13,  (39)VCE   ,. 
JE  =  1.61  X  10   (e        -  1), 


JC  =  7.2  x  10-^2^e(3^^^^^  -  1), 


CE  =  1.2/(.9  -  VCE)"'^'^  +  6240(JE  +  1.61  x  10  ^■^)  , 


CC  =  1.5/(.9  -  VCC)''^^  +  2.72  x  10^(JC  +  7.2  x  10  ^^) 


The  three  inputs,  VI,  V2,  and  V3  are  each  set  to  a  trapezoid  function 
of  time.  The  initial  level  was  3.5  v.;  the  rise  begins  at  5  nsec.  until 
30  nsec,  at  which  point  the  level  is  5.2  v.;  the  fall  begins  at  80 
nsec.  until  105  nsec,  when  the  level  is  again  3.2  v.   The  transient 
analysis  was  from  0  to  200  nsec.   Two  runs  were  made  which  are  summar- 
ized in  Tables  5. A  and  5.5.   The  second  run  differed  from  the  first  one 

in  that  all  of  the  capacitor  equations  in  the  models  were  multiplied  by 

3 
a  constant  factor  of  10  .   These  larger  capacitors  produce  a  much  more 
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TABLE  5.2  Results  of  transient  analysis  for  9-device  MOS  buffer  of 

Fig.  5.1.   Percent  savings  are  given  in  parentheses.   AOP-VO 
results  are  for  using  the  second  and  third  order  corrections 
only,  as  the  fourth-order  exact  corrections  were  too  costly 
to  evaluate. 


Counters 

AOP 

AOP-VO 

AOP-VO-A 

Time  Steps 

113 

115 

115 

2   Order  Corrections 

227 

134 

130 

3   Order  Corrections 

0 

98 

97 

4   Order  Corrections 

0 

0 

5 

STEP  5  Computer  Time 

.55 

.42  (24%) 

.53  (4%) 

STEPS  1-7  Computer  Time 

3.84 

3.64  (5%) 

3.75  (2%) 

TABLE  5.3  Results  of  transient  analysis  for  18-device  MOS  buffer  of 

Fig.  5.2.   Percent  savings  are  given  in  parentheses.  AOP-VO 
results  are  for  using  the  second  and  third  order  corrections 
only,  as  the  fourth-order  exact  corrections  were  too  costly 
to  evaluate. 


Counters 

AOP 

AOP-VO 

AOP-VO-A 

Time  Steps 

428 

430 

435 

2   Order  Corrections 

844 

449 

452 

3   Order  Corrections 

0 

398 

406 

4   Order  Corrections 

0 

0 

3 

STEP  5  Computer  Time 

3.79 

2.47  (35%) 

3.39  (10%) 

STEPS  1-7  Computer  Time 

15.48 

14.32  (7%) 

15.03  (3%) 
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Figure  5.3      (a)   ECL  gate  analyzed.      (b)    Bipolar  device  models. 
Resistance  units   are  in  KJ2. 
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TABLE  5.4  Results  of  transient  analysis  for  6  transistor  bipolar  ECL 
gate  of  Fig.  5.3.  Percent  savings  are  shown  in  parentheses. 


Counters 

AOP 

AOP-VO 

AOP-VO-A 

Time  Steps 

112 

113 

110 

2   Order  Corrections 

255 

158 

136 

3   Order  Corrections 

0 

119 

119 

4   Order  Corrections 

0 

8 

18 

STEP  5  Computer  Time 

1.53 

1.06  (31%) 

1.20  (21%) 

STEPS  1-7  Computer  Time 

3.80 

3.23  (15%) 

3.22  (15%) 

TABLE  5.5  Results  of  transient  analysis  for  6  transistor  bipolar  ECL 

gate  of  Fig.  5.3,  but  with  the  nonlinear  capacitor  equations 
multiplied  by  10^.   This  makes  the  circuit  harder  to  analyze. 
Percent  savings  are  shown  in  parentheses. 


Counters 

AOP 

AOP-VO 

AOP-VO-A 

Time  Steps 

195 

171 

164 

2   Order  Corrections 

488 

295 

250 

3   Order  Corrections 

0 

195 

178 

4   Order  Corrections 

0 

11 

5 

STEP  5  Computer  Time 

2.90 

1.89  (35%) 

2.21  (24%) 

STEPS  1-7  Computer  Time 

6.44 

5.19  (19%) 

4.41  (31%) 

-200- 


difficult  transient  analysis.   Observe  that  the  savings  were  larger 
for  the  more  complex  analysis  as  was  suggested  earlier. 

5.4  The  VO  Iterative  Method  in  DC  Analysis 

The  dc  analysis  of  a  circuit  represented  by  the  circuit  equations 
(5.43)  yields  the  values  w  and  q  for  which  q  =  0  in  (5.43).   Thus  dc 
analysis  consists  of  solving  the  system  of  nonlinear  equations  (5.43) 
with  q  =  0,  that  is 

F(w,  q,  0,  t  )  =  0    .  (5.45) 

The  difficulty  in  solving  (5.45)  is  that  normally   the  initial  guess 
of  a  solution  is  poor. 

The  AGP  program  and  other  programs  [52]  use  a  pseudo-transient 
analysis  technique  for  solving  (5.45).   The  method  stems  from  the  fact 
that  dc  analysis  may  be  defined  as  the  behavior  of  a  circuit  under  time- 
invariant  sources  after  such  a  long  period  of  time  that  all  voltages 
and  currents  are  constant.   Thus,  if  any  time-dependent  sources  of  the 
circuit  are  held  constant  at  their  t  =  t„  value,  a  transient  analysis 
of  (5.43)  until  q  is  small  normally  approximates  the  solution  of  (5.45). 
This  pseudo-transient  analysis  is  of  course  not  concerned  with  inter- 
mediate results,  and  thus  truncation  errors  are  allowed  to  be  large  in 
order  to  use  larger  pseudo  time-steps.   These  methods,  called  continu- 
ation methods  by  numerical  analysts  [38],  are  quite  reliable  but  in 
general  they  converge  to  the  dc  solution  very  slowly. 

We  can  of  course  add  the  VO  iteration,  as  done  in  the  last  section, 
to  this  pseudo-transient  analysis  method,  and,  as  will  be  shown,  similar 
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improvement  as  those  reported  in  the  last  section  are  obtained.   How- 
ever, an  iterative  method  was  also  used,  with  good  results,  which 
combines  the  pseudo  transient  analysis  method,  the  VO  iteration,  the 
damped  Newton  iteration,  and  the  tailored  iterative  method  described 
in  Section  5.2.1. 

The  underlying  idea  of  this  combined  iteration  is  to  do  a  pseudo 
transient  analysis  with  very  large  pseudo  time-steps.   Clearly,  the 
larger  the  pseudo  step  size  is,  the  smaller  q  becomes  and  the  more  the 
solution  of  (5.45)  is  approximated.   Thus,  we  can  still  describe  the 
entire  procedure  by  the  general  transient  iteration  given  in  Section 
5.3.   The  only  exception  is  that  the  termination  of  the  procedure  is 
not  at  some  fixed  value  of  time,  but  by  testing  the  smallness  of  q  at 
each  time  step  as  done  in  [52].   The  STEP  5  of  the  transient  iteration, 
which  is  where  the  nonlinear  equations  are  solved  at  each  time  step,  is 
once  again  the  VO  iterative  method  as  described  in  Section  5.2.2  with 
the  following   modifications.   In  STEP  2  of  the  VO  iteration,  where 
the  right-hand  side  of  (5.33)  is  evaluated,  if  any  of  the  nonlinear 
functions  has  exponential  behavior,  then  its  corresponding  G(z)  function 
is  selected  appropriately  as  illustrated  in  Section  5.2.1.   The  effect 
of  this  selection  is  a  tailored  Iterative  method  illustrated  in  (5.20) 
and  (5.21),  and  can  be  done  straightforwardly  in  the  function  sub- 
program which  supplies  the  nonlinear  function  values  and  their  partial 
derivatives.   The  second  modification  to  the  VO  iteration  is  in  STEP  6 
where  it  is  established  when  the  second-order  correction  is  large.   If 
the  second-order  correction  is  large,  instead  of  going  to  STEP  14,  a 
damping  parameter  is  automatically  computed  given  by 


p  =  min{l,  .5  max[l,  IxJU/Id^  |,   i  =  1 n}   .    (5.46) 

i 
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Then  set 


k+l    k     „k 
t         =  ^  -  P  ^2    '  (5.47) 


and  go  to  STEP  14.   Note  that  this  is  an  automatic  "damp"  Newton  itera- 
tion. The  damp  (5.46)  insures  that  any  component  of  x    is  different 
from  X  by  at  most  50%  (or  .5  if  the  component  is  less  than  1). 

Tables  5.6  and  5.7  summarize  the  results  for  two  of  the  circuits 
used  in  the  last  section.   The  AOP-VO-A+  column  is  for  the  combined  • 
iteration  described  above.   The  combined  iteration  worked  very  well, 
with  similar  results,  for  all  the  circuit  examples  of  the  last  section. 
For  the  18-device  MOS  buffer,  it  was  found  that  the  dc  solution  given 
by  AGP  was  not  as  accurate  as  the  dc  solution  obtained  with  the  combined 
iteration.   Thus,  the  comparison  of  the  results  of  Table  5,7  must  take 
this  discrepancy  into  consideration  (i.e.,  AOP  should  have  done  a  much 
longer  pseudo  transient  analysis  for  the  equivalent  accuracy  of  AOP- 
VO-A+) . 

5.5  Summary 

In  this  chapter  an  algorithm  based  on  the  variable  order  concept 
introduced  in  Chapter  3  was  implemented  to  solve  the  nonlinear  equations 
that  arise  in  transient  analysis  of  circuits.  Comparisons  with  an 
existing  and  already  very  efficient  method  show  modest  improvements  In 
efficiency.   Additionally,  a  combined  iteration  suitable  for  dc  analysis 
of  circuits  was  able  to  solve  several  examples  more  efficiently  than  an 
existing  method  based  on  solving  the  dc  problem  by  a  pseudo-transient 
analysis. 
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TABLE  5.6  Results  of  dc  analysis  for  6  transistor  bipolar  ECL  gate 
of  Fig.  5.3. 


Counters 

AOP 

AOP-VO 

AOP-V0-A+ 

Time  Steps 

96 

93 

5 

2   Order  Corrections 

155 

110 

17 

3   Order  Corrections 

0 

43 

5 

4   Order  Corrections 

0 

7 

0 

STEP  5  Computer  Time 

1.01 

.72  (30%) 

.15  (85%) 

TABLE  5.7  Results  of  dc  analysis  for  18-device  MOS  buffer  of  Fig.  5.2. 
It  was  found  that  the  solution  computed  by  AOP  was  in- 
correct, thus  this  must  be  considered  when  interpreting  the 
results. 


Counters 

AOP 

AOP-VO-A 

AOP-VO-A+ 

Time  Steps 

78 

78 

6 

2   Order  Corrections 

147 

92 

92 

3   Order  Corrections 

0 

61 

11 

4   Order  Corrections 

0 

7 

0 

STEP  5  Computer  Time 

.63 

.64  (-2%) 

.54  (14%) 

CHAPTER  6 
CONCLUSIONS  AND  FUTURE  RESEARCH  SUGGESTIONS 

In  this  chapter  the  contributions  of  this  research  will  be  out- 
lined.  The  problems  and  shortcomings  of  the  proposed  new  algorithms 
and  techniques  are  described,  and  some  suggestions  for  their  solution 
and  for  further  research  are  offered. 

6.1  Conclusions 

The  main  contribution  of  this  research  is  the  Variable-Order  (VO) 
algorithm  for  the  minimization  of  a  function  of  several  variables  de- 
rived in  Chapters  3  and  4.   The  VO  algorithm  has  two  distinctively  new 
features:   the  order  of  convergence  is  variable  as  high  as  four,  and  the 
scalar  problem  in  each  iteration  is  based  on  the  principle  of  moving  as 
far  away  as  possible  from  the  present  point. 

The  order  of  convergence  is  an  intrinsic  property  of  the  trans- 
formation function.   It  was  shown  in  Chapter  3  that  the  VO  algorithm  is 
based  on  truncations  of  a  Taylor  series  expansion  of  a  point  satisfying 
a  necessary  condition  for  a  solution.   The  general  derivation  of  this 
series  appears  to  be  novel,  although  particular  forms  of  the  series  have 
been  previously  obtained  as  was  mentioned.   Most  existing  algorithms 
have  order  of  convergence  less  than  or  equal  to  two,  while  the  VO  algo- 
rithm can  converge  with  order  as  high  as  four. 

The  scalar  search  sub-problem  was  defined  in  two  ways.   First,  when 
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the  guess  to  the  solution  is  very  poor,  the  principle  proposed  was  to 
move  as  far  away  from  the  present  point  as  possible,  as  long  as  the 
function  being  minimized  decreases  in  value.   This  principle  seems  to 
be  novel,  and  it  might  be  useful  in  other  existing  algorithms.   Second, 
when  the  guess  to  the  solution  is  very  good  or  when  the  points  generated 
by  the  VO  algorithm  approach  the  solution,  the  usual  principle  of  scalar 
minimization  is  used. 

The  major  features  and  properties  of  the  VO  algorithm  may  be  sum- 
marized as  follows: 

1)  A  novel  derivation  based  on  a  Taylor  series  expansion  of  a  point 
satisfying  necessary  conditions. 

2)  A  novel  scalar  search  problem  at  each  iteration.   The  scalar 
search  may  also  be  along  curved  trajectories  in  the  space  of 
the  independent  variables,  instead  of  always  along  straight 
lines  as  in  all  published  algorithms. 

3)  Since  the  order  of  convergence  may  be  as  high  as  four,  extremely 
accurate  solutions  may  be  found  requiring  reasonable  computer 
time. 

4)  Numerical  results  indicate  that  the  VO  algorithm  may  be  generally 
more  efficient  than  other  algorithms  for  minimizing  functions 
that  are  continuously  twice  dif ferentiable.   The  VO  algorithm 
may  also  be  more  successful  than  other  algorithms  in  avoiding 
convergence  to  saddle  points. 

5)  Extensions  to  include  box  constraints  on  the  independent  vari- 
ables and  to  obtain  approximations  of  high-order  derivative 
terms  make  the  VO  algorithm  potentially  very  useful. 
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On  the  other  hand,  several  shortcomings  of  the  VO  algorithm  can  be 
readily  identified: 

1)  The  function  to  be  minimized  must  be  quite  smooth  to  obtain  the 
improved  behavior  of  the  VO  algorithm.   It  was  shown  that  the 
algorithm  was  globally  convergent  for  twice  continuously  dif- 
ferentiable  functions.   The  numerical  results  show  that  for 
functions  not  meeting  these  continuity  conditions,  the  VO  algo- 
rithm may  be  generally  less  efficient  than  other  algorithms. 

2)  The  function,  the  gradient  and  the  hessian  values  are  required. 
The  hessian  requirement  severely  limits  the  usefulness  of  the 
algorithm.   IJhile  this  requirement  was  partially  relieved  by 
approximating  the  hessian  by  differences,  this  scheme  has  three 
disadvantages.   First,  for  problems  that  can  be  solved  by  other 
algorithms  in  a  number  of  iterations  about  equal  to  the  number 
of  independent  variables,  the  VO  algorithm  is  likely  to  be  less 
efficient,  as  was  indeed  the  case  with  some  numerical  examples 
given.   Second,  for  problems  with  a  large  number  of  independent 
variables,  other  algorithms  may  yield  better  approximate  solu- 
tions in  a  fixed  amount  of  computer  time.   Third,  obtaining  an 
accurate  hessian  approximation  by  differences  requires  even 
smoother  functions  than  twice  continuously  differentiable. 

3)  Computer  storage  requirements  are  greater  for  the  VO  algorithm 
than  for  most  other  existing  algorithms. 

These  disadvantages  of  the  VO  algorithm  must  be  considered  when  using 
it  to  solve  a  given  class  of  problems. 

Another  contribution  of  this  research  is  the  VO  iterative  method 
when  applied  to  transient  analysis  of  circuits  as  described  in  Chapter  5. 
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Modest  savings  were  realized  in  an  already  extremely  efficient  general 
purpose  program.   It  is  estimated  that  the  savings  to  other  existing 
transient  analysis  programs  could  be  very  substantial.   If  an  existing 
program  does  not  use  sparse  matrix  methods  in  the  implementation  of 
Newton's  iteration,  it  is  conjectured  that  implementing  the  VO  itera- 
tion could  produce  savings  of  similar  magnitude  to  those  obtained  by 
Implementing  sparse  matrix  methods,  and  the  VO  iteration  should  require 
less  effort  to  implement  than  a  sparse  matrix  method. 

Other  minor  contributions  of  this  research  may  be  summarized  as 
follows : 

1)  A  new  method  for  describing  minimization  algorithms  to  accom- 
modate the  curved  trajectories  of  the  VO  algorithm.   Some  of  the 
existing  theorems  were  extended  to  the  new  description. 

2)  An  apparently  novel  scheme  for  approximating  the  first  and 
second  derivatives,  the  gradient  and  the  hessian,  of  a  scalar 
function.   The  proposed  scheme  takes  into  consideration  errors 
that  may  be  present  in  the  function  values  used  in  the  approxi- 
mations . 

3)  A  modification  to  an  existing  procedure  for  computing  the 
Cholesky  factorization  of  the  hessian  [36]  which  changes  the 
hessian  in  a  minimal  manner  when  it  is  not  positive  definite. 

A)  A  potentially  useful  Taylor  series  expansion  of  the  solution 
point  of  a  system  of  nonlinear  equations.   Different  forms  of 
the  series,  when  truncated,  yield  different  iterative  methods 
which  were  called  tailored  iterative  methods.   A  tailored  itera- 
tive method  was  shown  to  be  very  useful  for  some  problems. 
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6.2  Future  Research  Suggestions 

The  derivation  of  the  Taylor  series  expansion  of  a  solution  In- 
volves an  arbitrary  function  g(z).   In  the  VO  algorithm  for  minimization, 
the  g(z)  function  used  was 

g(z)  =  (zP  zP   ....  zP)^    .  (6.1) 

%  'Xj  I        z  n 

It  was  also  shown  that  other  g(z)  functions  produced  different  Taylor 
series.   For  example,  for  p  =  1  the  g(z)  function  (6.1)  produces  the 
well  known  Euler  series  [19].   It  was  also  shown  in  Chapter  5  how  other 
functions  produce  iterative  methods  which  were  sometimes  useful.   Can 
other  g(z)  functions  be  found  that  are  more  useful?  For  example,  in 
minimization  the  g(z)  function  given  by 

P,     Po  P_  T 

^(z)  =  (z^  .  z^^  ....  z/)    ,  (6.2) 

would  yield  not  a  scalar  problem  at  each  Iteration  as  (6.1)  does,  but 
rather  a  problem  in  p.,  p  ,  ...,  p  space.   The  minimization  problem 
might  be  solved  in  this  new  space  in  a  simpler  manner,  however. 

It  appears  to  be  impossible  to  eliminate  the  smoothness  conditions 
of  twice  continuous  differentiability  to  insure  global  convergence  of 
the  VO  algorithm.   However,  perhaps  other  hesslan  approximation  tech- 
niques can  be  used.   Attempts  at  using  the  Fletcher  and  Powell  [22] 
rank-2  method  of  approximating  the  hesslan  inverse  did  not  prove  suc- 
cessful as  reported  in  Chapter  A.   Other  quasi-Newton  methods,  such  as 
[7,21],  are  also  likely  to  be  unsuccessful.  However,  perhaps  the 
difference  approximations  to  the  hesslan  can  be  Improved.   For  example. 
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a  scheme  might  be  devised  to  update  only  those  parts  of  the  hessian 
which  are  changing  the  most  from  the  last  Iteration  by  using  Information 
from  the  third-order  and  the  fourth-order  derivative  terms  which  are 
obtained  as  part  of  the  algorithm  at  each  Iteration. 

Finally,  It  may  be  possible  to  combine  the  VO  algorithm  with  other 
minimization  algorithms  to  obtain  a  better  overall  algorithm  for  some 
classes  of  functions. 


APPENDIX  I 

DESCRIPTION  OF  COMPUTER  PROGRAM  IMPLEMENTING 
THE  VARIABLE-ORDER  ALGORITHM 


In  this  appendix  the  program,  written  in  FORTRAN  IV,  which  imple- 
ments the  Variable-Order  (VO)  algorithm  and  three  other  algorithms  is 
described.   A  description  of  how  to  use  the  program  is  given  first, 
followed  by  a  brief  description  of  each  subroutine,  the  principle  ar- 
rays, variables  and  the  common  blocks  in  the  program. 

I.l  Using  the  Program 

First  a  subroutine  must  be  written  to  supply  the  function,  the 
gradient  and  the  hessian,  or  the  function  and  the  gradient,  or  only  the 
function  (it  is  recommended  that  if  possible  both  the  gradient  and  the 
hessian  be  supplied) .   The  program  calls  subroutine  FUNC  as  follows 

CALL  FUNC(X,  F) 

to  obtain  the  value  of  the  function  in  F  at  the  point  x,  denoted  by  the 
array  X.   When  the  gradient  is  supplied,  the  program  calls  subroutine 
GRAD,  when  it  requires  the  function  and  the  gradient,  as  follows 

CALL  GRAD(X,  F,  GF) 

where  GF  is  the  array  to  contain  the  gradient.   When  the  hessian  is 
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being  supplied,  the  program  calls  HESS  to  obtain  it  as  follows 

CALL  HESS(X,  G2F) 

where  G2F  is  the  array  to  contain  the  hessian  stored  row-wise  beginning 
each  row  with  the  diagonal;  thus  the  number  of  elements  of  G2F  to  be 
supplied  is  equal  to  n(n  +  l)/2,  where  n  is  the  number  of  independent 
variables.   An  example  of  how  a  subroutine  may  be  written  to  satisfy  the 
above  call  statements  is  shown  in  Fig.  I.l  for  Rosenbrock's  function 
V7hich  is  given  by 


f(x)  =  100(x„  -  xh^   +  (1  -  X  )^   , 
a,         Z  1  1 


with  its  gradient  given  by 


f'(x)  =  (-400x^(x2  -  xj)  -  2(1 -x^,   200(X2  -  xj))^ 


and  its  hessian  given  by 


a.  Oj 


1200xJ  -  AOOx^  +  2 


-AOOx, 


-AOOx 


200 


Observe  that  the  variable  ITCNT,  the  iteration  counter,  can  be  accessed 
via  the  common  block  OPINDI;  in  Fig.  I.l  this  variable  is  used  to  print 
a  heading.   Any  of  the  other  variables  in  the  common  blocks  to  be 
described  later  can  of  course  be  similarly  accessed  for  any  other  pur- 
pose.  While  it  is  convenient  to  make  HESS  an  entry  instead  of  a  separate 
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SUBROUTINE  GRAD(X,  F,  GF) 

IMPLICIT  REAL*8(A-H,0-Z) 

REAL*8   XC2),GF(2),G2F(2) 
C  NOTE,  ABOVE  DIMENSIONS  ARE  DUMMY 

C0MM0N/0PINDI/IDUM(3) .ITCNT 
C  THE  ABOVE  ALLOWS  THE  MONITORING  OF 

ASSIGN  120  TO  IG 

GO  TO  110 

ENTRY  FUNC(X,  F) 

ASSIGN   150   TO   IG 
110        IF ( ITCNT. EQ.O)   WRITE(6,    115) 
115        FORMATCl ROSENBROCK  PROBLEM'//) 

U  =  X(l)**2 

T  =  X(2)    -   U 

F  =   1D2AT*T  +   (IDO  -   X(1))*A2 

GO  TO  IG,(120,  150) 
C  THE  GRADIENT  ALSO  DESIRED... 

120   GF(1)  =  -(4D2*T*X(1)  +2D0*(1D0-X(1))) 

GF(2)  =  2D2*T 

GO  TO  150 
C  THE  HESSIAN  ENTRY 

ENTRY  HESS(X,  G2F) 

U  =  X(l)**2 

G2F(1)  =  12D2*U  +  2DO-4D2*X(2) 

G2F(2)  =  -4D2*X(1) 

G2F(3)  =  2D2 
150   RETURN 

END 


ITCNT  TO  PRINT  HEADINGS 


Figure  I.l  Example  of  subroutine  to  supply  the  function,  the  gradient 
and/or  the  hesslan  to  the  VO  algorithm.  Note  that  in  the 
ENTRY  HESS  any  variables  that  are  defined  in  the  other 
entries  must  not  be  used.  For  example,  the  variable  U 
defined  after  statement  number  115  must  not  be  used  in 
evaluating  any  of  the  G2F  elements  without  defining  it 
again. 
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subroutine,  it  should  be  treated  as  a  separate  subroutine.  This  means 
that  any  variables  defined  during  the  course  of  evaluating  the  function 
and/or  the  gradient  in  the  entries  FUNC  and/or  GRAD  should  not  be 
assumed  defined  in  the  entry  HESS.   Finally,  if  the  hessian  is  not  to 
be  supplied,  the  entry  HESS  should  still  be  defined  to  avoid  errors  with 
the  linkage  editor  (i.e.,  only  the  entry  statement  is  entered  without 
any  G2F  statements  under  it);  the  same  is  true  when  only  the  function 
values  are  to  be  supplied. 

Once  the  subroutine  supplying  the  information  about  the  function  to 
be  minimized  has  been  written,  it  must  be  compiled  and  linked  along  with 
the  rest  of  the  program.   Figure  1.2  shows  a  typical  deck  setup  for 
compiling,  loading  and  executing  the  program  in  a  typical  IBM  computer. 
More  efficient  repeated  executions  are  possible  if  the  program  is  com- 
piled once  and  the  object  deck  saved  in  a  permanent  file.   The  input  to 
the  program  is  by  the  use  of  a  NAMELIST  named  &IN  illustrated  in  Fig. 
1.2.   If  a  different  input  format  is  desired  the  MAIN  program  must  be 
changed;  also  the  program  has  a  limit  of  10  in  the  dimension  of  the 
independent  variables,  thus  the  dimensions  will  have  to  be  changed  to 
increase  this  limit. 

Most  of  the  various  options  and  variables  to  execute  the  VO  algo- 
rithm were  given  in  Section  4.5;  there  are  some  additional  options  and 
variables  as  follows: 

IDBUG  (default  is  0)  debug  switch.   Iflien  set  to  -1,  almost  every 
subroutine  prints  information  about  its  inputs  when  called, 
and  the  function,  gradient,  and  hessian  values  are  printed 
every  time  an  evaluation  is  obtained,  etc.   l^en  set  to  +1, 
even  more  debug  output  is  obtained;  for  example  intermediate 
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/ /ROSEN  JOB  (...installation  accounting. ..) .JIMENEZ 
//STEPl  EXEC  FORTGCG 
//FORT. SYS IN  DD  * 


(FORTRAN  IV  subroutines  with  VO  algorithm) 


SUBROUTINE  GRAD(X,  F,  GF) 


(subroutine  like  the  one  in  Fig.  I.l) 


/* 

//GO. OBJECT  DD  * 


(object  deck  of  NTIMEU,  the  assembly  language  function) 


/* 

//GO.SYSIN  DD  * 

&IN  NX  =  2,  X  =  -1.2,  1,  MAXAV  =  3,  &END 
/* 


Figure  1.2  Typical  setup  for  executing  VO  algorithm  in  an  IBM 
computer  with  standard  catalog  procedures. 
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results  of  the  factorization  of  the  hessian  are  displayed. 
MAXORD  (default  is  4)   The  maximum  transformation  order  to  use 

(must  be  less  than  or  equal  to  4).   When  the  algorithm  selected 
is  the  Davidon-Fletcher-Powell  (by  setting  IDFPSW  =  1) ,  see 
below),  if  MAXORD  Is  greater  than  2,  the  VO  algorithm  is  used 
every  n  +  1  iterations  (i.e.,  on  the  assumption  that  an 
approximation  of  the  hessian  inverse  is  available);  however, 
it  is  recommended  that  MAXORD  be  set  to  2  if  IDFPSW  =  1  when 
the  DFP  algorithm  is  desired. 
IPRSW  (default  is  1)  print  switch.   Normally  after  each  iteration 
the  value  of  the  function,  the  gradient  and  the  independent 
variables  are  printed  along  with  the  principal  counters.   If 
IPRSW  is  set  to  0  only  the  initial  and  final  values  are 
printed.   If  IPRSW  is  set  to  -1,  the  value  of  the  gradient 
is  not  printed  at  each  iteration. 
IDFPSW  (default  is  0)  switch  to  use  the  Davidon-Fletcher-Powell 
algorithm  when  set  to  1  (see  MAXORD  above  for  related  vari- 
able) .   The  variable  RELSCH  controls  the  accuracy  of  the 
scalar  search  (see  Section  4.5). 
ICONJG  (default  is  0)  switch  to  use  the  conjugate  gradients  algo- 
rithm of  Fletcher-Reeves  of  the  steepest  descent  algorithm. 
If  ICONJG  is  set  to  1  and  MAXORD  is  greater  than  1,  then  the 
Fletcher-Reeves  algorithm  is  selected.   If  ICONJG  is  set  to  1 
and  MAXORD  is  equal  to  1,  then  steepest  descent  is  selected. 
The  variable  RELSCH  controls  the  accuracy  of  the  scalar  search 
(see  Section  4.5). 
ISEED  (default  12748)  general  purpose  variable  which  is  in  common 
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block  OPINDI  to  be  used  by  the  user  subroutine  for  any  purpose. 
For  example,  in  implementing   Rosenbrock's  problem   with 
errors  in  the  function  and  the  gradient  as  done  for  the  problem 
in  Section  4.6.9,  this  variable  was  used  to  set  the  seed  for 
the  random  number  generator. 

The  program  loops  to  read  more  input  if  there  is  any  more  after  it 

finishes  with  a  set  of  data;  this  allowed  the  testing  of  several  options 

in  one  execution  for  the  results  given  in  this  research.   Some  examples 
of  possible  input  data  are  the  following: 


&IN  NX  =  2,  X  =  -1.2,  1,  MAXAV  =  2,  B0X(1,1)  =  -1.3, 
B0X(3,1)  =  .5,  B0X(1,2)  =  0,  BOX(3,2)  =  2,  &END 


which  specifies  the  dimension  as  2,  the  initial  guess  x  =  (-1.2,  1)''', 
that  the  function  and  the  gradient  values  are  supplied,  and  the  box 
constraints  are  -1.3  _^  x  <  .5,  and  0  £  x  <  2; 


&IN  NX  =  4,  X  =  1,  1.5,  1.5,  1,  MAXAV  =  2,  FABS  =  5D  -  5, 

FREL  =  5D  -  4,  GFABS  =  5D  -  5,  GFREL  =  5D  -  4,  B0X(1,1)  =  .1, 

B0X(3,1)  =  10,  B0X(1,2)  =  .1,  BOX(3,2)  =  10,  B0X(1,3)  =  .1, 

BOX(3,2)  =  10,  B0X(1,4)  =  .1,  BOX(3,4)  =  10,  &END 


which  specifies  a  four-dimensional  problem  with  initial  guess 

0  T 

X  =  (1,  1.5,  1.5,  1)  ,  that  the  function  and  the  gradient  are  supplied 

but  with  errors,  both  absolute  errors  are  5  x  10~  and  both  relative 

-4 
errors  are  5  x  10   ,  and  with  the  box  constraints  on  all  the  independent 

variables  the  same,  . 1  £  x^  £  10,  i  =  1,2,3,4.   Observe  that  in  namelist 

input  the  &IN  must  begin  in  column  2  of  the  card  or  line,  and  column  1 

must  be  blank  on  all  cards  or  lines. 
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1.2  Description  of  the  Program 

There  are  three  common  blocks,  OPINDI,  IPINDR,  and  OPDATA.   The 
conimon  OPINDI  contains  integer  variables  and  it  is  defined  as  follows: 


COMMON  /OPINDI/  IDBUG,NX,NG2F, ITCNT,MAXAV,MAXORD,MAXIT,IORD, 
*  IP(10),NFS,NGFS,NHESS,NBOUND,IDFPSW,IC0NJG,IRESET,ICOORD,ISEED 


where  most  of  the  variables  have  been  previously  explained.   The  ones 
that  have  not  been  explained  are  the  following:   NG2F  is  the  number  of 
elements  in  the  compressed  hessian  which  is  stored  in  G2F,  it  is 
(NX*(NX+ 1)) /2;  lORD  contains  the  transformation  order  used  at  each 
iteration;  IP (10)  is  the  permutation  used  at  each  iteration  in  the 
factorization  of  the  hessian;  NFS,  NGFS,  NHESS  are  the  counters  for 
the  number  of  function,  gradient  and  hessian  evaluations;  NBOUND  con- 
tains the  number  of  independent  variables  at  their  boundaries  for  the 
last  point  used;  IRESET  is  used  in  resetting  the  conjugate  gradients 
algorithm  of  Fletcher-Reeves;  ICOORD  contains  the  last  coordinate 
direction  used,  or  zero  if  the  last  iteration  was  not  a  coordinate 
direction. 

The  common  block  OPINDR  contains  the  real,  double  precision, 
variables  and  it  is  defined  as  follows: 


COMMON  /OPINDR/  PREL,PABS,STPEPS,P,PMIN,G2F(45) ,C(10,4) ,BOUND, 

*  POS,FREL,FABS,GFREL,GFABS,PHREL,PHABS,HG2FII(10),PERT(10), 

*  SPERT(10),STPAPS,RELSCH 


where  most  of  the  variables  have  also  been  defined  and  explained  pre- 
viously.  The  ones  that  have  not  been  explained  previously  are  the 
following:   P  is  the  step-length  in  the  transformation  functions  at 
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iteration;  PMIN  is  used  to  save  the  value  of  the  p  which  minimizes  the 
function  in  the  scalar  search  to  be  used  in  any  subsequent  iterations; 
C(10,4)  is  used  to  contain  the  transformation  function  coefficient 
vectors;  BOUND  contains  the  distance  of  the  projection  transformation 
form  the  actual  transformation  when  the  trajectories  fall  outside  the 
box  constraints,  thus  it  is  a  measure  of  how  much  the  actual  transfor- 
mation is  being  projected;  POS  contains  the  maximum  norm  of  the  diagonal 
matrix  D  obtained  in  the  factorization  of  the  hessian  (see  Section 
3.5);  HG2FII(10)  contains  the  diagonal  elements  of  the  hessian  whenever 
the  hessian  is  being  approximated  by  differences;  PERT (10)  contains  the 
perturbations  used  for  the  last  gradient  approximation,  whenever  only 
function  values  are  supplied;  SPERT(IO)  is  a  temporary  array  to  hold  the 
perturbations  in  some  instances  (see  subroutine  OPXKPl);  STPAPS  is  set 
to  the  square  root  of  STPEPS  which  is  used  in  the  convergence  tests  as 
the  relative  constant. 

The  last  common  block  is  DPDATA  and  it  is  defined  as  follows: 

COMMON  /DPDATA/  BOX (3,  26) 

where  BOX  contains  the  box  constraints  as  previously  described.   The 
dimensions  of  BOX  were  defined  as  shown  because  that  is  the  way  they  are 
defined  in  the  computer  program  AOP  where  the  VO  algorithm  replaced  its 
minimization  algorithm.   In  order  to  be  consistent,  BOX  should  be  de- 
fined B0X(3,  10);  then  any  increase  in  the  number  of  independent  vari- 
ables could  be  achieved  by  increasing  all  the  dimensions  of  10  in  the 
common  blocks,  the  dimension  of  G2F  should  become  n(n  +  1)12,   where  n 
is  the  maximum  number  of  variables,  and  there  are  two  other  dimension 
statements  that  need  to  be  changed,  one  in  the  MAIN  program  and  another 


-219- 


in  subroutine  OPXKPl,  both  from  10  to  whatever  number  of  variables  are 
desired. 

We  can  now  describe  the  function  of  each  of  the  subroutines  in  the 
program.   Each  subroutine  will  be  described  by  first  briefly  stating  its 
function,  and  then  listing  the  subroutines  that  call  it,  and  the  sub- 
routines that  it  calls. 

MAIN  -  This  is  the  main  subroutine  which  reads  the  data,  determines 
when  the  algorithm  has  converged  and  prints  the  results.   It 
calls  NTIMEU,  OPGRAD,  OPXKPl,  OPFXGF,  OPGFCK,  OPEVX,  and 
OPCOOR. 

OPXKPl  -  This  subroutine  computes  the  transformation  functions  and 
returns  the  next  point  which  reduced  the  function  being  mini- 
raized.   It  is  called  by  the  MAIN  program.   It  calls  OPEVX, 
OPFXGF,  OPORDH,  OPGRCK,  0P0RD2,  OPCNJG,  OPGRAD,  OPFUNC,  OPHESS , 
and  SFBSUB. 

OPDFP  -  This  subroutine  handles  the  Davidon-Fletcher-Powell  algo- 
rithm.  It  is  called  by  OPHESS.   It  calls  OPBDRY,  OPFUNC,  and 
OPEVX. 

OPCOOR  -  This  subroutine  handles  the  selection  of  coordinate  search 
directions.   It  is  called  by  the  MAIN  program.   It  calls 

OPEVX,  OPFUNC,  and  OPBDRY. 

k+1 
OPEVX  -  This  subroutine  evaluates  a  point  x    given  a  value  of  p, 

and  it  takes  care  of  in  effect  projecting  the  transformation 

when  the  point  falls  outside  the  box  constraints.   It  is  called 

by  almost  every  subroutine.   It  does  not  call  any  subroutine. 

OPGFCK  -  This  function  computes  the  maximum  norm  of  the  gradient 

without  taking  into  consideration  those  coordinates  that  are 
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at  the  boundary.   It  is  called  by  MAIN,  and  OPXKPl.   It  does 
not  call  any  subroutine. 

OP0RD2  -  This  subroutine  handles  the  second-order  transformation 
scalar  search.   It  is  called  by  OPXKPl.   It  calls  OPEVX, 
OPBDRY,  OPFUNC,  and  OPGRAD. 

OPOIIDH  -  This  subroutine  handles  the  scalar  search  for  the  third 
and  fourth  order  transformations.   It  is  called  by  OPXKPl. 
It  calls  OPFUNC,  OPEVX,  and  OPBDRY. 

OPBDRY  -  This  subroutine  solves  the  scalar  minimization  problem 

for  the  Davidon-Fletcher-Powell,  the  conjugate  gradients,  and 
the  steepest  descent  algorithms.   It  is  also  called  when  the 
transformations  are  projected,  and  when  coordinate  searches 
are  used.   It  is  called  by  OPDFP,  OPCNJG,  0P0RD2,  OPORDH,  and 
OPCOOR.   It  calls  OPEVX,  and  OPFUNC. 

OPCNJG  -  This  subroutine  handles  the  conjugate  gradient  and  steep- 
est descent  algorithms.   It  is  called  by  OPXKPl.   It  calls 
OPBCRY,  OPEVX,  and  OPFUNC. 

OPGRAD  -  This  subroutine  and  its  entry  OPFUNC  handle  the  calls  to 
the  user-written  subroutines  to  evaluate  the  gradient  and  the 
function.   It  also  approximates  the  gradient  whenever  it  is 
not  supplied  (MAXAV  =  1).   OPGRAD  is  called  by  MAIN,  OPXKPl, 
0PORD2;  OPFUNC  is  called  by  almost  all  the  subroutines.   Sub- 
routines called  are  FUNC,  and  GRAD. 

OPFUNC  -  See  OPGRAD. 

OPHESS  -  This  subroutine  handles  the  evaluation  or  approximation  of 
the  hessian  and  its  factorization,  or  the  hessian  update 
whenever  the  Davidon-Fletcher-Powell  algorithm  is  being  used. 
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If  the  hessian  is  not  supplied  with  the  VO  algorithm,  it  also 
handles  its  approximation.  It  is  called  by  OPXKPl.  It  calls 
HESS,  OPGRAD,  OPFUNC,  SFAC,  and  OPDFP. 

OPFXGF  -  This  subroutine  corrects  the  forward  difference  gradient 
by  adding  the  term  equal  to  one  half  times  the  hessian  dia- 
gonals, times  the  perturbations  used  (see  Section  4.3.2).   It 
is  called  by  OPXKPl  and  MAIN.   It  does  not  call  any  subroutines. 

SFAC  -  This  subroutine  handles  the  factorization  of  the  hessian  as 
described  in  Section  3.5.   It  is  called  by  OPHESS.   It  does 
not  call  any  subroutines. 

SFBSUB  -  This  subroutine  handles  the  forward  and  back  substitution 
for  the  higher-order  correction  and  if  the  Davidon-Fletcher- 
Powell  algorithm  is  being  used,  it  multiplies  the  approximate 
hessian  inverse  times  the  gradient.   It  is  called  by  OPXKPl. 
It  does  not  call  any  subroutines. 

NTIMEU  -  This  is  a  function  written  in  IBM/370  assembly  language 
which  returns  the  elapsed  time.   It  is  called  by  MAIN.   It 
does  not  call  any  subroutine. 

All  of  the  above  subroutines  require  approximately  fifty  kilobytes  when 
compiled  and  linked  with  all  the  required  FORTRAN  IV  built-in  functions. 
It  is  estimated  that  by  eliminating  the  options  and  the  required  sub- 
routines for  the  Davidon-Fletcher-Powell  and  for  the  conjugate  gradient 
algorithms,  and  by  eliminating  the  numerous  print  statements  used  for 
debugging  output  that  the  storage  requirements  could  be  cut  by  one 
third. 
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